Hello, and welcome to Paper of the Day (Po'D): On Compressed Sensing and the Estimation of Continuous Parameters Edition. Today's paper is: J. K. Nielsen, M. G. Christensen and S. H. Jensen, "On compressed sensing and the estimation of continuous parameters from noisy observations," rejected four times and still not published.

This paper conclusively demonstrates that sparse representation with compressed sensing does harm for estimating the continuous parameters underlying some noisy signal that is sparse, or approximately so, in a dictionary. The first problem, introduced by sparse representation, is the assumption that our set of signals \(\{\vu\}\) are sparse in one set of functions \(\mathbf{\Psi}\):
$$
\{\vx : \vu = \mathbf{\Psi}\vx + \vn \}
$$
where each \(\vx\) is mostly zero, and each \(\vn\) is an error in modeling and independent noise.
When we know \(\mathbf{\Psi}\), then we have no problem other than recovering the underlying sparse vector from the observation.
This can be done using any of a number of sparse approximation algorithms.
In the real world, however, we may not know \(\mathbf{\Psi}\).
Considering acoustic signals, it makes little sense to assume they are all composed of a finite handful of particular frequencies.
In this case, we assume each atom of \(\mathbf{\Psi}\) is a sinusoid with a continuously distributed frequency and phase, and then the problem is that we wish to estimate these parameters for each observation \(\vu\).

Now comes compressed sensing, which multiplies the above from the left a \(M\times N\) sensing matrix \(\MPhi\), and reduces the dimensionality of the problem: $$ \{\vx : \MPhi\vu = \MPhi\mathbf{\Psi}\vx + \MPhi\vn \}. $$ This introduces the second problem. So, we wish to still estimate the continuous parameters underlying a \(\vu\), but now we must do it from the smaller \(\MPhi\vu\). It should be quite clear here that unless our sensing matrix has some special properties, we cannot do better than if we use \(\vu\).

In this paper, the authors show in a straightforward manner that for several popular sensing matrices the error variance of the \(k\)th parameter obeys $$ \sigma^2_{k,CS} \ge \frac{N}{M} [\MI^{-1}(\mathbf{\theta})]_{kk} $$ where \([\MI^{-1}(\mathbf{\theta})]_{kk}\) is the Cramer-Rao lower bound of the variance of the \(k\)th parameter estimate error. So, we hurt ourselves by compressed sensing. The more severely we undersample, the worse our estimates become. But this is not necessarily so. If we know the subspace in which our signal projects, which amounts to knowing its exact parameters, then we are ok. But in the real world, this is likely not to be the case.

Now comes compressed sensing, which multiplies the above from the left a \(M\times N\) sensing matrix \(\MPhi\), and reduces the dimensionality of the problem: $$ \{\vx : \MPhi\vu = \MPhi\mathbf{\Psi}\vx + \MPhi\vn \}. $$ This introduces the second problem. So, we wish to still estimate the continuous parameters underlying a \(\vu\), but now we must do it from the smaller \(\MPhi\vu\). It should be quite clear here that unless our sensing matrix has some special properties, we cannot do better than if we use \(\vu\).

In this paper, the authors show in a straightforward manner that for several popular sensing matrices the error variance of the \(k\)th parameter obeys $$ \sigma^2_{k,CS} \ge \frac{N}{M} [\MI^{-1}(\mathbf{\theta})]_{kk} $$ where \([\MI^{-1}(\mathbf{\theta})]_{kk}\) is the Cramer-Rao lower bound of the variance of the \(k\)th parameter estimate error. So, we hurt ourselves by compressed sensing. The more severely we undersample, the worse our estimates become. But this is not necessarily so. If we know the subspace in which our signal projects, which amounts to knowing its exact parameters, then we are ok. But in the real world, this is likely not to be the case.

Hi Bob,

At Spars11 Gilles and Manuel mentioned to me a work on "Continuous Basis Pursuit" from Ekanadham et al. that appears to address the issue of recovering continuous parameters. The original method essentially does linear interpolation between the sampled bases---using a derivative term to express the deviation from the atoms on the grid. For the program to be convex you must restrict weights to be positive, but I think this can be worked around.

An updated paper is here, with supposedly improved interpolation schemes:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5893953&tag=1

It also shows the unnecessary density that may result from mismatched atoms.

Perhaps a hybrid approach between greedy (fast) and convex methods is possible. One approach would be to refine the max correlation atom once you've chosen it by solving a small interpolation problem. Another approach would be to replace the correlation computation with max-correlation or min error over the small interpolation domain for that atom entirely and select the argmax atom.

cheers.

"It should be quite clear here that unless our sensing matrix has some special properties, we cannot do better than if we use u.

[...]

So, we hurt ourselves by compressed sensing."

I have a philosophical disagreement with this statement. I think you don't use compressive sensing (CS) because you want to "do better" than just using $u$. In fact, if you have $u$ in your hands (is already on your computer, to be precise), you do nothing but make your life more complicated by multiplying $u$ by $\Phi$. CS helps when taking measurements is expensive or time consuming (reducing the time it takes to get an MRI is a great example). If you have no problem in sampling your signal, e.g. an audio signal, I don't see the point of using CS.

Thanks Graham. I will have to take a look at that work. Note that continuous parameter estimation within the sparse approximation framework is in: M. G. Christensen and S. H. Jensen, "On Perceptual Distortion Minimization and Nonlinear Least-Squares Frequency Estimation", IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 99-109, 2006.

Hi Alejandro. I agree. It is a question of costs vs. benefits. This work just shows what we expect all along: throwing away samples, even the CS way, hurts when your signal is sparse in some unknown basis.

Thanks Bob for recommending the paper. The work is quite interesting for the way it represents distortion measures so they are accessible to pursuit methods. However, from my initial reading, it seems that we are still using discrete frequency atoms on the DFT grid---Section III "It can be solved efficiently since the inner products ... can be found using FFTs." Are these atoms later refined to off-the-grid values?

Hi Graham. Indeed, the Vandermonde matrix (7) and its use in (9) embodies continuous frequencies. So do the estimators (26) and (43) and (65). The functional of (26) is continuous in frequency, and one can estimate its maximum by ideal interpolation of the DFT of the residual.