December 2010 Archives

Hello, and welcome to Paper of the Week (Po'W): Tuning Iterative Reconstruction Algorithms Edition. This week's paper is from my holiday reading list, and addresses the anonymous question on Nuit Blanche about the need for an approach to benchmarking: A. Maleki and D. L. Donoho, "Optimally tuned iterative reconstruction algorithms for compressed sensing," IEEE J. Selected Topics in Signal Process., vol. 4, pp. 330-341, Apr. 2010.

In the interest of extreme brevity, here is my one line description of the work in this paper:
The authors produce complete recipes of tuned algorithms for out-of-the-box iterative signal reconstruction from compressive measurements.

Let the parrots hit the floor

| No Comments

Holiday reading!

| No Comments
As I am leaving my office in Denmark for the holidays, I have planned a short reading list on the topics most near to me heart.

On addressing my frustration regarding explaining the strange dependence of recovery of sparse signals by OMP (and its ilk) on priors:
  1. A. Maleki and D. L. Donoho, "Optimally tuned iterative reconstruction algorithms for compressed sensing," IEEE J. Selected Topics in Signal Process., vol. 4, pp. 330-341, Apr. 2010.
  2. M. A. Davenport and M. B. Wakin, "Analysis of orthogonal matching pursuit using the restricted isometry property," IEEE Trans. Info. Theory, vol. 56, pp. 4395-4401, Sep. 2010.
  3. V. Cevher, "Learning with compressible priors," in Proc. Neural Info. Process. Syst., (Vancouver, BC, Canada), Dec. 2009.
  4. Y. Jin and B. D. Rao, "Performance limits of matching pursuit algorithms," in Proc. Int. Symp. Info. Theory, (Toronto, ON, Canada), pp. 2444-2448, July 2008.
  5. A. Barron, A. Cohen, W. Dahmen, and R. A. DeVore, "Approximation and learning by greedy algorithms," Annals of Statistics, vol. 36, no. 1, pp. 64-94, 2008.
  6. J. D. Blanchard, C. Cartis, J. Tanner, and A. Thompson, "Phase transitions for greedy sparse approximation algorithms," (submitted somewhere), Apr. 2010.
  7. E. J. Candès, Y. C. Eldar, and D. Needell, "Compressed sensing with coherent and redundant dictionaries," arXiv:1005.2613v1, May 2010.
  8. E. J. Candès and T. Tao, "Near-optimal signal recovery from random projections: Universal encoding strategies?," IEEE Trans. Info. Theory, vol. 52, pp. 5406-5425, Dec. 2006.
  9. J. Tropp, A. C. Gilbert, and M. J. Strauss, "Algorithms for simultaneous sparse approximation. part i: Greedy pursuit," Signal Process., vol. 86, pp. 572-588, Mar. 2006.
  10. J. Tropp, "Greed is good: Algorithmic results for sparse approximation," IEEE Trans. Info. Theory, vol. 50, pp. 2231-2242, Oct. 2004.
On other approaches for sparse signal recovery (for implementation and testing in MATLAB):
  1. J. A. Tropp and S. J. Wright, "Computational methods for sparse solution of linear inverse problems," Proc. IEEE, vol. 98, pp. 948-958, June 2010.
  2. E.-T-Liu and V. Temlyakov, "Orthogonal super greedy algorithm and application in compressed sensing," Tech. Rep. 01, University of South Carolina, South Carolina, USA, 2010.
  3. K. Labusch, E. Barth, and T. Martinetz, "Bag of pursuits and neural gas for improved sparse coding," in Proc. Int. Conf. Computational Statistics, pp. 327-336, 2010.
  4. G. Peyré, "Best basis compressed sensing," IEEE Trans. Signal Process., vol. 58, pp. 2613-2622, May 2010.
  5. C. Herzet and A. Drémeau, "Bayesian pursuit algorithms," in Proc. European Signal Process. Conf., (Aalborg, Denmark), pp. 1474-1478, Aug. 2010.
Let's not forget about audio!
  1. M. Christensen and A. Jakobsson, Multi-pitch estimation, Morgan & Claypool Publishers, 2009.
  2. R. Bardeli and F. Kurth, "Robust identification of time-scaled audio," in Proc. AES Int. Conf., (London, U.K.), June 2004.
Fun bedtime reading and general relaxation:
  1. G. Isely, C. J. Hillar, and F. T. Sommer, "Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication," (submitted somewhere), Nov. 2010.
  2. T. Hromádka, M. R. DeWeese, and A. M. Zador, "Sparse representation of sounds in the unanesthetized auditory cortex," PLOS Biology, vol. 6, pp. 0124-0137, Jan. 2008.
  3. S. Kay, "A new approach to Fourier synthesis with application to neural encoding and speech classification," IEEE Signal Process. Lett., vol. 17, pp. 855-858, Oct. 2010.
  4. A. M. Bruckstein, D. L. Donoho, and M. Elad, "From sparse solutions of systems of equations to sparse modeling of signals and images," SIAM Review, vol. 51, pp. 34-81, Feb. 2009.
Then I have a course to prepare! That fun reading includes
  1. I. Millington, Artificial Intelligence for Games, Morgan Kaufmann, 2006.
  2. S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed., Prentice Hall, 2009.
  3. C. Reas and B. Fry, Getting Started with Processing, O'Reilly Media, Inc., 2008.
Will a holiday so busy be a holiday at all?

Quotes from Peer Review 2010

| No Comments
Here is an excellent end-of-year list of selected quotes from referees for the journal Environmental Microbiology. My favorites are:

  • The writing and data presentation are so bad that I had to leave work and go home early and then spend time to wonder what life is about.
  • The biggest problem with this manuscript, which has nearly sucked the will to live out of me, is the terrible writing style.
  • I have to admit that I would have liked to reject this paper because I found the tone in the Reply to the Reviewers so annoying. It may be irritating to deal with reviewer's comments (believe me, I know!) but it is not wise to let your irritation seep through every line you write!
  • It is sad to see so much enthusiasm and effort go into analyzing a dataset that is just not big enough.
  • Done! Difficult task, I don't wish to think about constipation and faecal flora during my holidays! But, once a referee, always and anywhere a referee; we are good boy scouts in the research wilderness. Even under the sun and near a wonderful beach.
I wonder, is there such a thing from the signal processing literature?

A reviewer of one of our past journal papers (B. L. Sturm, J. J. Shynk, L. Daudet, and C. Roads, "Dark energy in sparse atomic estimations," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 671-676, Mar. 2008) stated,
[Your term] "dark energy" is reasonable, but a bit non-scientific or fad driven in my opinion. I don't have a strong objection to the 'dark matter' connotations (except that dark energy in a signal model is nowhere near as ubiquitous as dark matter in the universe!), but if we're going to be theoretical astrophysicists here, I'm going to request some more careful statements throughout the manuscript!
A reviewer of one my rejected articles stated,
The proposed constraints are said to be motivated by previous work of the authors, which this manuscript does motivate reading.
And a reviewer of another paper (B. L. Sturm, C. Roads, A. McLeran, and J. J. Shynk, "Analysis, visualization, and transformation of audio signals using dictionary-based methods," in Proc. Int. Computer Music Conf., (Belfast, Ireland), Aug. 2008), stated in part
The visual pleasure of the reader reaches a peak in Fig. 8 ...
which makes me blush.

Of course, I cannot divulge some of my peer-review statements, but I once told a group of students at their examination, "Your final report is missing an amazing amount of bullshit. Good work."

The Sounds of Star Wars

| No Comments
I want!

Warning: the following is written in real time during an attempt to prove my instinct correct, but in the end proves wrong. I am humbled.

I mentioned the other day a curious observation of OMP correcting itself in recovering sparse signals from compressive measurements. In my experiments, I gauged the success of OMP in recovering 13-sparse signals from 50 measurements (i.e., correlations with random vectors). My code overlooked stopping OMP at the known sparsity of the vector, and instead let OMP run either 100 iterations, or when the signal residual energy ratio is 100 dB. When I later changed the stopping criterion to 13 iterations I noticed that the recovery rate was significantly reduced. A commenter here does not find that hard to believe, since OMP involves an orthogonal projection after each iteration. However, I have the idea that the first 13 atoms selected by OMP must be the correct ones (w.r.t. supports, not amplitudes), and this is echoed in the proof of the "Exact Recovery Theorem" of J. Tropp, "Greed is good: Algorithmic results for sparse approximation," IEEE Trans. Info. Theory, vol. 50, pp. 2231-2242, Oct. 2004.
Today I received a nice message in my mailbox expressing appreciation for my work:

Dear Dr. Bob L. Sturm,
 
These are Journal of Communication and Computer (ISSN: 1548-7709, USA) & Computer Technology and Application (ISSN: 1543-7332, USA).
 
We have read your excellent paper ' MUSICAL INSTRUMENT IDENTIFICATION USING MULTISCALE MEL-FREQUENCY CEPSTRAL COEFFICIENTS ' from 'The 2010 European Signal Processing Conference (EUSIPCO 2010)' and we are pleased to pass on our regards to you. We are very interested in your research, if the paper mentioned has not been published in other journals or you have other unpublished papers in hand and have the idea of making our journals a vehicle for your research interests, please feel free to send electronic version to us.
From my post yesterday I have received several helpful comments:

  1. Igor at Nuit Blanche provided some analysis (Thanks Igor!) of my sampling setup with respect to the Donoho-Tanner phase transition. We see from the image below that with an "oversampling ratio" of \(\delta = 50/250 = 0.2\) (number of measurements over dictionary size), and an "undersampling ratio" of \(\rho = 13/50 = 0.26\) (sparsity over number of measurements), recovery by BP will not happen in most cases. (Under the solid blue curve live sparse vectors that can be recovered by BP from random projections onto sensing matrices that satisfy some restricted isometry property... I think.)



  2. A colleague has highlighted the connection between this phase transition and the "thresholding effect" observed in estimation in parametric models, i.e., the complete breakdown of an estimator when some assumption in a model is slightly broken.
  3. A commenter (Alejandro from the Colorado School of Mines --- hey Alejandro, I will be visiting my folks in Evergreen next week. Maybe we can get a cup of coffee in Golden?) has suggested that OMP can in fact correct itself through its reprojection. This is something that I must explore since I am astonished by it.
Warning: I go on and on, without any real direction, on various topics. Finally, I present some interesting results, which may or may not be related to something I have overlooked in my code.

Commenter Phil Schniter sent me the following explanation of my results from the past few weeks, with an insightful cross-disciplinary link:
Greedy methods are said to work better when there is a large dynamic range among the (active) coefficient magnitudes. Intuitively, if the largest coefficient is easy to spot, then a greedy method can identify it and subtract its contribution from the measurements. Then, if the largest coefficient in the residual is easy to spot, the greedy method will next identify it, and so on. But if, instead, there are multiple large coefficients with similar magnitudes, a greedy method can get confused. This behavior has been rigorously analyzed in the context of multiuser communications under the guise of "successive interference cancellation for CDMA".
I have received the following from my alma mater, University of California, Santa Barbara:

The Media Arts and Technology Program at the University of California, Santa Barbara, invites applications for a Lecturer, starting July 1, 2011. The department seeks candidates who will teach core and specialized courses in digital media programming and signal processing, with a focus on digital audio applicable to immersive, interactive, and distributed environments, working with high-dimensional data generated in scientific and artistic domains. The successful candidate will be expected to collaborate with artists, engineers, and scientists in an interdisciplinary environment of research, creative work, and teaching.

Media Arts and Technology (MAT) is a transdisciplinary graduate program at UCSB in both the College of Letters and Science (Division of Humanities and Fine Arts) and the College of Engineering. MAT offers Master's and PhD degrees and has approximately 40 graduate students and 10 faculty, several with joint appointments in engineering and arts departments. Areas of expertise include human-computer interaction, electronic music and sound design, computational visual and spatial arts, and multimedia signal processing. Offices and labs are housed in the new California Nanosystems Institute building at UCSB, which includes a unique research facility called the Allosphere, a three-story spherical immersive environment.

Additional information about the department can be found at http://www.mat.ucsb.edu. Applicants are expected to hold a doctoral degree in Media Arts and Sciences, Computer Science, Computer Music, or a closely related field, have demonstrated excellence in research, and have a strong commitment to teaching and interdisciplinary scholarship and/or creative activity.

The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching, and service. Primary consideration will be given to applications received by February 1, 2010; however, the position will remain open until filled. Applications must include a CV, research and teaching statements, and at least three letters of reference. See http://www.mat.ucsb.edu/recruit for information on how to apply.

The University of California is an Equal Opportunity / Affirmative Action Employer.
I have received the following notice from my alma mater University of California, Santa Barbara:

The Media Arts and Technology Program at the University of California, Santa Barbara, invites applications for a tenure-track position at the assistant professor level, starting July 1, 2011. The department seeks candidates who will establish a vigorous research and teaching program in computer graphics, scientific/information visualization, or a related field applicable to immersive, interactive, and distributed environments, working with high-dimensional data generated in scientific and artistic domains. The successful candidate will be expected to collaborate with artists, engineers, and scientists in an interdisciplinary environment of research, creative work, and teaching.

Media Arts and Technology (MAT) is a transdisciplinary graduate program at UCSB in both the College of Letters and Science (Division of Humanities and Fine Arts) and the College of Engineering. MAT offers Master's and PhD degrees and has approximately 40 graduate students and 10 faculty, several with joint appointments in engineering and arts departments. Areas of expertise include human-computer interaction, electronic music and sound design, computational visual and spatial arts, and multimedia signal processing. Offices and labs are housed in the new California Nanosystems Institute building at UCSB, which includes a unique research facility called the Allosphere, a three-story spherical immersive environment. Additional information about the department can be found at http://www.mat.ucsb.edu.

Applicants are expected to hold a doctoral degree in Media Arts and Sciences, Computer Science, or a closely related field, have demonstrated excellence in research, and have a strong commitment to teaching and interdisciplinary scholarship and/or creative activity. The department is especially interested in candidates who can contribute to the diversity and excellence of the academic community through research, teaching, and service. Primary consideration will be given to applications received by February 1, 2011; however, the position will remain open until filled. Applications must include a CV, research and teaching statements, and at least three letters of reference. See http://www.mat.ucsb.edu/recruit for information on how to apply.

The University of California is an Equal Opportunity / Affirmative Action Employer.
Continuing from my experiments over the past few weeks, today I simulate sparse signal recovery from compressive measurements using both OMP and \(\ell_1\) minimization (the principal known as BP). I run 1000 independent trials (using the same sensing matrix, with elements iid normal random, and columns made unit norm) using a signal of length 250 samples with sparsity 11 compressed into 50 measurements. The magnitudes of the non-zero elements of the sparse vector are iid uniform on [1,2]. I now look at some of the cases where OMP and BP succeeds and fails.
I have just finished presenting my portion of a PhD course here at Aalborg University, during which I talked for nearly 8 hours about sparse approximation for audio and music signals, as well as adaptive concatenative sound synthesis --- which I consider to be another "dictionary-based method". My slides are available in four parts here, under teaching.

Now I can focus on continuing my experiments from the past week, and in particular the interesting directions commenter Phil Schniter provides. For instance, in the sparse signals greedy methods fail to reconstruct from compressive measurements, do we see a lack of dynamic range in the non-zero coefficients? Are there too many active elements too close together? More experiments will tell!
Last night I ran some experiments to compare the recovery behaviors of greedy approaches and \(\ell_1\) minimization (BP) for sparse vectors with non-zero elements distributed differently. First, I have elements distributed standard normal. Then, elements distributed Laplacian. Third, elements with magnitudes distributed uniformly in \([1,2]\). And finally, elements from \(\{-1,1\}\), with equal probability. (I normalize every sparse vector to 1.) Below we see how the average performance (200 trials) of three greedy approaches compares with that of BP --- which appears to stay quite consistent over all four distributions. The greedy approaches are wildly inconsistent. ranging from over 90% recovery for vectors with sparsity above 0.25 to only 0.16.
I have spent the past week studying the performance of various greedy algorithms in recovering sparse signals from compressive measurements, with most interest in that of cyclic matching pursuit, and cyclic orthogonal least squares. Then I looked at the performance of relaxing the strict sparsity criterion with the \(\ell_1\) norm, the principle known as Basis Pursuit (BP). It appeared that BP performs much worse than all the others --- which was strange to me because with a sensing matrix of size 50 x 250, its coherence seems to me to preclude greedy methods from performing any good at all. Thinking it was some result of using CVX for solving the convex problem, I used MATLAB's linear program function, and found that BP is still underperforming the greedy methods.

Then, a commenter (thank you Alejandro!) took my code and substituted Elad's sparse signal construction for mine and found the expected result: BP outperforms greedy approaches. We can see below that only when the sparsity of the sensed signal is over 8/50, then reconstruction by BP begins to fail, while all the greedy approaches begin to fail after 5/50. But actually, comparing the BP performance seen below to that of my previous study, we see its performance hasn't changed, but rather the performances of the greedy approaches became worse.

BPRecoveryProbability.jpg
CMPRecoveryProbability.jpg So what is the difference between the sparse signals that has caused these differences? Apparently, it is a magnitude thing. Before, I generated the signals like so:
activelements = randperm(N);
x(activelements(1:K)) = randn(K,1);
x = x./sqrt(x'*x);
y = Phi*x;
where N is 250, K is the sparsity, and Phi is the \(50 \times 250\) sensing matrix. These sparse signals apparently make greedy recovery methods way out-shine the more complex and sophisticated approach using convex relaxation. Elad instead makes sparse signals like so:
activelements = randperm(N);
x(activelements(1:K)) = sign(randn(K,1)).*(1+rand(K,1));
x = x./sqrt(x'*x);
y = Phi*x;
The difference is embiggened and emboldened. All the non-zero elements in Elad's sparse signals have a magnitude uniformly distributed in \([1,2]\) (before normalization), whereas in my original approach I created sparse signals with non-zero elements having amplitudes distributed standard normal (before normalization). This is entirely confusing to me. Why should greedy methods work better than \(\ell_1\) minimization when entries are distributed normally?

Why Isn't BP Working?

| 1 Comment
With the advice of a commenter yesterday, I replaced my CVX code with that by Michael Elad, which is just the linear program presented in S. S. Chen, D. L. Donoho, and M. A. Saunders, "Atomic decomposition by basis pursuit," SIAM J. Sci. Comput., vol. 20, pp. 33-61, Aug. 1998., i.e.,

xhatBP = linprog( ones(2*N,1), [], [], [Phi, -Phi], y, zeros(2*N,1), 100*ones(2*N,1) );
xhatBP = xhatBP(1:N) - xhatBP(N+1:end);

where N is the length of x before sensing by Phi, creating the measurements in y = Phi x. The last two inputs ensure that the solution has positive elements between 0 and 100. With this change I ran OMP, PrOMP, and BP for 1000 iterations, and find the same behavior as yesterday! (Code here.)

Why is this relaxed sparsity constraint performing so poorly with respect to the probability of "exact recovery" (which is when \(||\vx - \hat\vx||_2^2 \le 10^{-10}\))? I remember reading somewhere that "a good rule of thumb is \(M \approx 5K\)," where the sparsity is \(K\), and the number of measurements is \(M\); but that seems a bit low looking at these results. I am still not convinced this underperformance of BP is not due to some numerical error, or my naïveté in selecting a proper sensing matrix.

RecoveryProbability.jpg
Continuing my experiments from the previous few days, I have now coded and experimented with Probabilistic OMP (PrOMP), and the convex relaxation of the strict sparsity measure to an \(\ell_1\)-norm of the solution, a.k.a, the principle known as Basis Pursuit (BP for short). The problem set-up is the same as before. I am not including any noise. PrOMP turned out to be a real hassle to tune, but I finally settled on \(m=2\), \(\epsilon = 0.001\), and stopped the search when either \(||\vr||^2 \ge 10^{-22}\) or 10,000 iterations have elapsed. (An explanation of these values is here.) For BP I used CVX, with the following

cvx_begin
      variable xhatBP(dimensions);
      minimize( norm(xhatBP,1) );
      subject to
          y == Phi*xhatBP;
cvx_end

where Phi is the same kind of matrix used before, and y are the measurements. This time, under time constraints, I only looked at sparsity values up to 0.4, and I only performed 500 runs of each algorithm (each with the same sensing matrix and sparse vector). Below we see a comparison of the probability of "exact recovery" as a function of sparsity of these methods with MP and OMP.

RecoveryProbability.jpg
First, it is immediately clear that MP is now making a fool of itself. Why does it keep showing up to this party? BP, however, is surprisingly poor! (Unless I have made some error in the code.) I don't know why this is, but there was little change when I forced the sensing matrix to have a coherence of less than 0.5. OMP does well, and PrOMP does a little better.

However, in coding I realized something bad about PrOMP. PrOMP selects atoms at random based on a distribution that places most probability mass on the atoms highly coherent with the residual, and less on all the others (and zero on those atoms already selected, or with zero length projections). PrOMP terminates when the residual norm is exactly zero, at which time it serves up the solution. But, even though the norm residual is zero, we are not guaranteed the solution is correct! There are infinitely many solutions \(\vx\) such that \(||\vy - \mathbf{\Phi}\vx|| = 0\) by virtue of \(\mathbf{\Phi}\) being fat and full rank. It has a null space into which we can point the sparsest solution and still have zero norm residual. And there cometh the spark of \(\mathbf{\Phi}\) being that much less sparse than the sparsest solution. (In my experiments, the null space of the dictionaries have at least dimension 200, so that is a low spark.) Thus, there is no guarantee PrOMP serves up the correct solution. Yet still! PrOMP does a fine job of outperforming OMP and BP.

My code is attached. Next I shall experiment with re-weighted \(\ell_1\) and \(\ell_2\) methods, iterative hard thresholding, CoSaMP, StOMP, regularized OMP, A*OMP, two-stage thresholding, etc.

A Musical Survey

| No Comments
Hear ye, hear ye, the following I just received from electronic music researcher and composer-performer Nick Collins.


You are invited to take part in an online survey on musical influences. This is part of an exploratory study into musician's attitudes to influence and the creative process. It would be great to get your involvement, and hear your opinions!

Responses are anonymous, and you are under no obligation to complete the survey even if you start it; however, if you do choose to submit the survey, the anonymised data will be used to inform the study, and may appear in a future publication. Completing this survey should take around 15 minutes, and involves giving some quantitative ratings to statements, and some written responses to questions. I'm really interested in anything you have to say on the subject!

There is no financial reward for taking part; the only reward is my immense gratitude, and a promise that any resulting publication will be publicised on this mailing list in due course.

WEB LINK TO SURVEY: http://www.informatics.sussex.ac.uk/users/nc81/survey/phpESP/public/survey.php?name=musicalinfluences1

Many thanks for your time,
Nick Collins
Music Informatics
University of Sussex
Yesterday I presented some results of CMP recovering compressively sensed sparse signals. Over the night I ran them again, but this time with orthogonal least squares (OLS) and cyclic orthogonal least squares (COLS) --- which is presented in my paper discussed here.

RecoveryProbability.jpg For the same experimental conditions previously, above we can clearly see that all greedy cousins to MP begin to fail at nearly the same sparsity of 0.14, but COLS "fails better" than all the others --- almost 10% better recovery at a sparsity of 0.35. Still, I wouldn't want my banking information recovered like so. Below we see the mean ell2 norm of the approximation error. We see that COLS provides less mean error than the others. (The curve for OLS had a bunch of NaNs, so I don't know what is going on there. I predict it will be a bit east of the CMP line, and north of the COLS line.)

ell2normerror.jpg Now, I wonder if CMP and COLS can be further improved by reconsidering multiple atoms at the same time instead of one?

Try this at home! Feel free to reproduce my results with this MATLAB code. It is much more simple than the code presented here since I am not considering complex multiscale time-frequency dictionaries.

Blog Roll

About this Archive

This page is an archive of entries from December 2010 listed from newest to oldest.

November 2010 is the previous archive.

January 2011 is the next archive.

Find recent content on the main index or look in the archives to find all content.