Busy semester!!!

Over the past few weeks I have been working at making OMP and OLS as efficient as possible. Now I think I have it to where the only place where they can become more efficient is in the dictionary search. Given we have selected \(k\) atoms, OMP selects the next atom by $$\min_{n \in \Omega} \| \vr_k - \langle \vr_k, \phi_n \rangle \phi_n \|_2 = \max_{n \in \Omega} |\langle \vr_k, \phi_n \rangle| $$ where the dictionary \(\MPhi\) has unit-norm atoms \(\phi_n\) as columns, and \(\vr_k\) is the orthogonal projection of \(\vu\) onto the span of the \(k\) selected atoms, which I denote as the matrix \(\MPhi_k\). OLS selects the next atom by $$\min_{n \in \Omega}\left \| \MP_k^\perp \vu - \frac{\langle \MP_k^\perp \vu, \phi_n \rangle \MP_k^\perp\phi_n }{\|\MP_k^\perp\phi_n\|_2^2}\right \|_2 = \max_{n \in \Omega} \frac{|\langle \MP_k^\perp \vu, \phi_n \rangle|}{\|\MP_k^\perp\phi_n\|_2^2} $$ where \(\MP_k^\perp\) is the orthogonal projection onto the subspace orthogonal to the span of the \(k\) selected atoms. Consider we have the QR decomposition of the dictionary submatrix, \(\MPhi_k = \MQ_k\MR_k\). This means \(\MQ_k\) is an orthonormal basis for the span of the \(k\) selected atoms, and \(\MR_k\) is an upper triangular matrix describing how to build the \(k\) atoms from the basis. Since \(\vr_k = \vu - \MQ_k\MQ_k^T\vu = \MP_k^\perp \vu\), the OMP selection criterion becomes $$\max_{n \in \Omega} |\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle| = \max_{n \in \Omega} |\langle \vu,\phi_n \rangle - \langle \MQ_k\MQ_k^T\vu, \phi_n \rangle| $$ and the OLS selection criterion becomes $$\max_{n \in \Omega} \frac{|\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} = \max_{n \in \Omega} \frac{|\langle \vu,\phi_n \rangle - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} $$ The first term we need compute only once at the beginning of the decomposition. So the second term is what we need to update efficiently.

Over the past few weeks I have been working at making OMP and OLS as efficient as possible. Now I think I have it to where the only place where they can become more efficient is in the dictionary search. Given we have selected \(k\) atoms, OMP selects the next atom by $$\min_{n \in \Omega} \| \vr_k - \langle \vr_k, \phi_n \rangle \phi_n \|_2 = \max_{n \in \Omega} |\langle \vr_k, \phi_n \rangle| $$ where the dictionary \(\MPhi\) has unit-norm atoms \(\phi_n\) as columns, and \(\vr_k\) is the orthogonal projection of \(\vu\) onto the span of the \(k\) selected atoms, which I denote as the matrix \(\MPhi_k\). OLS selects the next atom by $$\min_{n \in \Omega}\left \| \MP_k^\perp \vu - \frac{\langle \MP_k^\perp \vu, \phi_n \rangle \MP_k^\perp\phi_n }{\|\MP_k^\perp\phi_n\|_2^2}\right \|_2 = \max_{n \in \Omega} \frac{|\langle \MP_k^\perp \vu, \phi_n \rangle|}{\|\MP_k^\perp\phi_n\|_2^2} $$ where \(\MP_k^\perp\) is the orthogonal projection onto the subspace orthogonal to the span of the \(k\) selected atoms. Consider we have the QR decomposition of the dictionary submatrix, \(\MPhi_k = \MQ_k\MR_k\). This means \(\MQ_k\) is an orthonormal basis for the span of the \(k\) selected atoms, and \(\MR_k\) is an upper triangular matrix describing how to build the \(k\) atoms from the basis. Since \(\vr_k = \vu - \MQ_k\MQ_k^T\vu = \MP_k^\perp \vu\), the OMP selection criterion becomes $$\max_{n \in \Omega} |\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle| = \max_{n \in \Omega} |\langle \vu,\phi_n \rangle - \langle \MQ_k\MQ_k^T\vu, \phi_n \rangle| $$ and the OLS selection criterion becomes $$\max_{n \in \Omega} \frac{|\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} = \max_{n \in \Omega} \frac{|\langle \vu,\phi_n \rangle - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} $$ The first term we need compute only once at the beginning of the decomposition. So the second term is what we need to update efficiently.

We can iteratively update \(\langle \MQ_k\MQ_k^T\vu, \phi_n \rangle\)
since \(\MQ_k = [\MQ_{k-1} | \vq_k ]\) where \(\vq_k\) is the new column added to the QR decomposition when OMP selects \(\phi_{n_k}\).
It is simply given by
$$
\vq_k = \frac{ \phi_{n_k} - \MQ_{k-1}\MQ_{k-1}^T\phi_{n_k}}{\|\phi_{n_k} - \MQ_{k-1}\MQ_{k-1}^T\phi_{n_k}\|_2}.
$$
Thus, we see that for all atoms in the dictionary
$$
\langle \MQ_k\MQ_k^T\vu, \phi_n \rangle =
\langle \MQ_{k-1}\MQ_{k-1}^T\vu, \phi_n \rangle +
\langle \vq_k \vq_k^T \vu, \phi_n \rangle.
$$
Considering that we have already calculated and stored the first term,
we only need evaluate the second for all atoms in the dictionary.
Notice that nowhere in these iterations does OMP or OLS need to compute the actual solution.
That comes only at the very end!

Each time OMP and OLS selects an atom, it must update the QR decomposition. This can be done efficiently by the following approach. First the algorithm updates the upper triangular matrix: $$\begin{equation} \MR_k = \left [ \begin{array}{cc} \MR_{k-1} & \vw \\ \mathbf{0}^T & \sqrt{1 - \|\vw\|_2^2} \end{array} \right ] \end{equation}$$ where \(\MR_{k-1}^T\vw = \MPhi_{\Omega_{k-1}}^T\phi_{n_k}\). This can be solved efficiently by back substitution since the left side matrix is triangular. Also, we must have already computed \(\MPhi_{\Omega_{k-1}}^T\phi_{n_k}\), which we can do at the very beginning of the decomposition by computing the Gramian \(\MPhi^T\MPhi\). Then, to update the orthonormal basis we perform $$\begin{equation} \MQ_{k} = \left [ \begin{array}{cc} \MQ_{k-1} & (\MI - \MQ_{k-1}\MQ_{k-1}^T)\phi_{n_k} / \sqrt{1 - \|\vw\|_2^2} \end{array} \right ]. \end{equation}$$ Notice that \(\|\phi_{n_k} - \MQ_{k-1}\MQ_{k-1}^T\phi_{n_k}\|_2 = \sqrt{1 - \|\vw\|_2^2}\), so we can save some computation there as well. OLS has the added computation of the normalization factor for each atom. This can be done efficiently in an iterative fashion.

Finally, OMP and OLS need to compute the solution by \(\MR_k^T\MR_k\vs_k = \MPhi_k^T\vu\). This is quite efficiently done by solving first \(\MR_k^T\vv = \MPhi_k^T\vu\), and finally, \(\MR_k\vs_k = \vv\). Then we put the non-zero elements of \(\vs_k\) into the correct places of the longer solution vector \(\vs\).

Below we see the empirical phase transition for OMP found using this code. I am averaging 50 trials, where the condition for exact recovery is the full and exact recovery of the solution support, nothing more and nothing less.

And below is the empirical phase transition for OLS found using this code. The line appears jaggedy because I am only averaging the results of 50 trials.

There may be some more spots in which these algorithms can be sped up even more, but I think their main bottleneck is the search for the maximum projection.

Each time OMP and OLS selects an atom, it must update the QR decomposition. This can be done efficiently by the following approach. First the algorithm updates the upper triangular matrix: $$\begin{equation} \MR_k = \left [ \begin{array}{cc} \MR_{k-1} & \vw \\ \mathbf{0}^T & \sqrt{1 - \|\vw\|_2^2} \end{array} \right ] \end{equation}$$ where \(\MR_{k-1}^T\vw = \MPhi_{\Omega_{k-1}}^T\phi_{n_k}\). This can be solved efficiently by back substitution since the left side matrix is triangular. Also, we must have already computed \(\MPhi_{\Omega_{k-1}}^T\phi_{n_k}\), which we can do at the very beginning of the decomposition by computing the Gramian \(\MPhi^T\MPhi\). Then, to update the orthonormal basis we perform $$\begin{equation} \MQ_{k} = \left [ \begin{array}{cc} \MQ_{k-1} & (\MI - \MQ_{k-1}\MQ_{k-1}^T)\phi_{n_k} / \sqrt{1 - \|\vw\|_2^2} \end{array} \right ]. \end{equation}$$ Notice that \(\|\phi_{n_k} - \MQ_{k-1}\MQ_{k-1}^T\phi_{n_k}\|_2 = \sqrt{1 - \|\vw\|_2^2}\), so we can save some computation there as well. OLS has the added computation of the normalization factor for each atom. This can be done efficiently in an iterative fashion.

Finally, OMP and OLS need to compute the solution by \(\MR_k^T\MR_k\vs_k = \MPhi_k^T\vu\). This is quite efficiently done by solving first \(\MR_k^T\vv = \MPhi_k^T\vu\), and finally, \(\MR_k\vs_k = \vv\). Then we put the non-zero elements of \(\vs_k\) into the correct places of the longer solution vector \(\vs\).

Below we see the empirical phase transition for OMP found using this code. I am averaging 50 trials, where the condition for exact recovery is the full and exact recovery of the solution support, nothing more and nothing less.

And below is the empirical phase transition for OLS found using this code. The line appears jaggedy because I am only averaging the results of 50 trials.

There may be some more spots in which these algorithms can be sped up even more, but I think their main bottleneck is the search for the maximum projection.

You may be interested in this:

http://venefrombucharest.wordpress.com/2011/08/07/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-1/

http://venefrombucharest.wordpress.com/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/

It's also about optimizing OMP. Although it's about a Python implementation, I think some of the remarks are more general.

You may be interested in this:

http://venefrombucharest.wordpress.com/2011/08/07/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-1/

http://venefrombucharest.wordpress.com/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/

It is also about optimizing OMP. Although it's a Python implementation, some remarks may be interesting.