Hello, and welcome to Paper of the Day (Po'D): On Compressed Sensing and the Estimation of Continuous Parameters Edition. Today's paper is: J. K. Nielsen, M. G. Christensen and S. H. Jensen, "On compressed sensing and the estimation of continuous parameters from noisy observations," rejected four times and still not published.

# October 2011 Archives

In my continuing analysis of the recovery performance of CMP, I have compared the computational cost of the algorithm to both OMP and MP with debiasing (MP+).
The nice figure below shows as a function of the phase plane, the average cost of attempting to recover a signal using one of three algorithms. Sandwiched between MP+ and OMP (using a very lightweight QR implementation) is CMP, which has a cost that rises quickly once the sparsity exceeds a particular level.
Most intriguing though is that CMP (with no more than two refinement cycles each model augmentation) and OMP show the same empirical phase transition (50% recovery, black line) --- at least at this problem dimension of 400 --- but the boundary separating where CMP becomes more expensive than OMP is higher (red dashed).

So there it is, proof that CMP is less computationally complex than OMP for the same recovery performance w.r.t. the majority exact recovery (full support recovery, nothing more and nothing less). Now, what about 90% recovery?

So there it is, proof that CMP is less computationally complex than OMP for the same recovery performance w.r.t. the majority exact recovery (full support recovery, nothing more and nothing less). Now, what about 90% recovery?

After I pointed out something was not right with the scikit-learn implementation of OMP, Vlad N. and Alejandro W. took up the challenge and found the very subtle reason for its mysterious behavior: column swapping gone awry. I have now tested the new implementation against my slower one, and the results show a nearly perfect match to the expected behavior ... maybe even a little better than the QR implementation. Good work!

I am now busy analyzing the results of our multiple simulations of cyclic greedy approaches to signal recovery from compressive measurements. Next month I will present a portion of this work at the 2011 Asilomar Conference on Signals, Systems and Computers. We are investigating a basic question: can we make the pure greedy algorithm (matching pursuit) competitive with more complex solvers of inverse problems and maintain its simplicity and low computational overhead? The answer is yes, but with a few additions.

- We must perform a debiasing step at the end of the pursuit to trim the support set, which has minimal complexity since it is only done once no matter the expected sparsity. This can be done by an orthogonal projection onto the selected features, and a hard thresholding of the weights. (Note that this step is also necessary with other more complex approaches, such as \(ell_1\) minimization (basis pursuit) and gradient projection for sparse representation.)
- We take a cyclic approach, wherein we employ the pure greedy algorithm again. This adds to the computational cost.

Continue reading Signal recovery with pure greed.

With some help from Vlad N., and Alejandro Weinstein, I was able to translate my MATLAB implementation of OMP with QR factorization into Python. And below are the results I am seeing with an exact recovery condition of Maleki and Donoho, i.e., \(\|\vx - \hat\vx\|_2^2 < 10^{-4}\|\vx\|_2^2\). This is exactly what should happen.

And here is the phase transition for exact support recovery.

Here is my code: omp_phase3.py. It doesn't run as fast as I was hoping, and in fact takes about 100 times as long as MATLAB. Maybe I am doing some unnecessary copying to and from memory, transpositions, and whatever.

And here is the phase transition for exact support recovery.

Here is my code: omp_phase3.py. It doesn't run as fast as I was hoping, and in fact takes about 100 times as long as MATLAB. Maybe I am doing some unnecessary copying to and from memory, transpositions, and whatever.

While some other experiments were running, I spent some time this weekend learning Python so I can explore the possibilities of running my code run faster, and outside of MATLAB. Alejandro let me know about the work being done by Vlad here on optimizing OMP. So, as a first step, I pieced together the code here, and computed the phase transition of OMP for sparse signals distributed Gaussian. I am using the implementation of OMP that is a part of scikitlearn, and decomposing up to the sparsity of the signal. Below is the empirical phase transition for 800 dimensional signals (100 trials at each pair), and exact recovery defined as \(\|\vx - \hat\vx\|_2^2 < 10^{-2}\|\vx\|_2^2\). Except for the speed at which this code computed the results, something is not right here! Increasing the number of measurements should never hurt recovery.

Tonight I will have a closer look at that OMP implementation. My other MATLAB experiments are still running!

Tonight I will have a closer look at that OMP implementation. My other MATLAB experiments are still running!

Busy semester!!!

Over the past few weeks I have been working at making OMP and OLS as efficient as possible. Now I think I have it to where the only place where they can become more efficient is in the dictionary search. Given we have selected \(k\) atoms, OMP selects the next atom by $$\min_{n \in \Omega} \| \vr_k - \langle \vr_k, \phi_n \rangle \phi_n \|_2 = \max_{n \in \Omega} |\langle \vr_k, \phi_n \rangle| $$ where the dictionary \(\MPhi\) has unit-norm atoms \(\phi_n\) as columns, and \(\vr_k\) is the orthogonal projection of \(\vu\) onto the span of the \(k\) selected atoms, which I denote as the matrix \(\MPhi_k\). OLS selects the next atom by $$\min_{n \in \Omega}\left \| \MP_k^\perp \vu - \frac{\langle \MP_k^\perp \vu, \phi_n \rangle \MP_k^\perp\phi_n }{\|\MP_k^\perp\phi_n\|_2^2}\right \|_2 = \max_{n \in \Omega} \frac{|\langle \MP_k^\perp \vu, \phi_n \rangle|}{\|\MP_k^\perp\phi_n\|_2^2} $$ where \(\MP_k^\perp\) is the orthogonal projection onto the subspace orthogonal to the span of the \(k\) selected atoms. Consider we have the QR decomposition of the dictionary submatrix, \(\MPhi_k = \MQ_k\MR_k\). This means \(\MQ_k\) is an orthonormal basis for the span of the \(k\) selected atoms, and \(\MR_k\) is an upper triangular matrix describing how to build the \(k\) atoms from the basis. Since \(\vr_k = \vu - \MQ_k\MQ_k^T\vu = \MP_k^\perp \vu\), the OMP selection criterion becomes $$\max_{n \in \Omega} |\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle| = \max_{n \in \Omega} |\langle \vu,\phi_n \rangle - \langle \MQ_k\MQ_k^T\vu, \phi_n \rangle| $$ and the OLS selection criterion becomes $$\max_{n \in \Omega} \frac{|\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} = \max_{n \in \Omega} \frac{|\langle \vu,\phi_n \rangle - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} $$ The first term we need compute only once at the beginning of the decomposition. So the second term is what we need to update efficiently.

Over the past few weeks I have been working at making OMP and OLS as efficient as possible. Now I think I have it to where the only place where they can become more efficient is in the dictionary search. Given we have selected \(k\) atoms, OMP selects the next atom by $$\min_{n \in \Omega} \| \vr_k - \langle \vr_k, \phi_n \rangle \phi_n \|_2 = \max_{n \in \Omega} |\langle \vr_k, \phi_n \rangle| $$ where the dictionary \(\MPhi\) has unit-norm atoms \(\phi_n\) as columns, and \(\vr_k\) is the orthogonal projection of \(\vu\) onto the span of the \(k\) selected atoms, which I denote as the matrix \(\MPhi_k\). OLS selects the next atom by $$\min_{n \in \Omega}\left \| \MP_k^\perp \vu - \frac{\langle \MP_k^\perp \vu, \phi_n \rangle \MP_k^\perp\phi_n }{\|\MP_k^\perp\phi_n\|_2^2}\right \|_2 = \max_{n \in \Omega} \frac{|\langle \MP_k^\perp \vu, \phi_n \rangle|}{\|\MP_k^\perp\phi_n\|_2^2} $$ where \(\MP_k^\perp\) is the orthogonal projection onto the subspace orthogonal to the span of the \(k\) selected atoms. Consider we have the QR decomposition of the dictionary submatrix, \(\MPhi_k = \MQ_k\MR_k\). This means \(\MQ_k\) is an orthonormal basis for the span of the \(k\) selected atoms, and \(\MR_k\) is an upper triangular matrix describing how to build the \(k\) atoms from the basis. Since \(\vr_k = \vu - \MQ_k\MQ_k^T\vu = \MP_k^\perp \vu\), the OMP selection criterion becomes $$\max_{n \in \Omega} |\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle| = \max_{n \in \Omega} |\langle \vu,\phi_n \rangle - \langle \MQ_k\MQ_k^T\vu, \phi_n \rangle| $$ and the OLS selection criterion becomes $$\max_{n \in \Omega} \frac{|\langle \vu - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} = \max_{n \in \Omega} \frac{|\langle \vu,\phi_n \rangle - \MQ_k\MQ_k^T\vu, \phi_n \rangle|}{\|\phi_n - \MQ_k\MQ_k^T\phi_n\|_2^2} $$ The first term we need compute only once at the beginning of the decomposition. So the second term is what we need to update efficiently.

Continue reading Highly Efficient OMP and OLS.

Bob L. Sturm, Associate Professor

Audio Analysis Lab

Aalborg University Copenhagen

A.C. Meyers Vænge 15

DK-2450 Copenahgen SV, Denmark

Email: bst_at_create.aau.dk