I have been experimenting with the approach to feature extraction posed in J. Andén and S. Mallat, "Multiscale scattering for audio classification," Proc. Int. Soc. Music Info. Retrieval, 2011. Specifically, I have substituted these "scattering coefficients" for the features used by Bergstra et al. 2006 into AdaBoost for music genre recognition.

The idea behind the features reminds me of the temporal modulation analysis of Panagakis et al. 2009, which itself comes from S. A. Shamma, "Encoding sound timbre in the auditory system", IETE J. Research, vol. 49, no. 2, pp. 145-156, Mar.-Apr. 2003. One difference is that these scattering coefficients are not psychoacoustically derived, yet they appear just as powerful as those that are.

More Experiments with MOD

| No Comments
Continuing from my experiments last week, I have decided to test to what degree the problems of MOD are inherited from the problems of OMP as the choice in its approximation step. So as before, I run MOD for 1000 iterations to learn dictionaries of cardinality 128 from 256 measurements generated from 8 length-64 atoms (each sampled from the uniform spherical ensemble). The weights of the linear combination are iid Normal. In these experiments, however, I substitute the oracle support and find the weights by least squares. So, given the \(l\)th measurement is composed of atoms indexed by \(I_l\), after each dictionary update, I find the new weights by an orthogonal projection onto the span of the atoms indexed by \(I_l\). In this way, we remove OMP to see the behavior of MOD in the best case scenario: known support. Finding the support is the hardest part anyway.

Below are mean coding errors over all 1000 iterations for 100 independent trials. We see MOD found the correct dictionary less than 10% of the time.

normal_error_oracle.png A close-up of where the majority end shows a mean error of about -20 dB. Using instead OMP, and picking 3 times the number of atoms with debiasing, we see a mean error of about -9 dB in the same experimental setup. So, clearly, knowing the support does help.

normal_error_oracle_closeup.png These results appear better than those reported in B. Mailhé and M. D. Plumbley, "Dictionary Learning with Large Step Gradient Descent for Sparse Representations", Proc 10th International Conference on Latent Variable Analysis and Source Separation (LVA/ICA 2012), Tel-Aviv, Israel, LNCS 7191, pp. 231-238, March 12-15, 2012.

My further experimentation reveals that OMP often finds the support of all measurements when using the real dictionary; and when it is run to twice or three times the expected sparsity, it always finds it (at least in the 100 independent trials I am running). This is not so surprising since this dictionary, sampled from the uniform spherical ensemble, likely has very low coherence. So, this says to me that the failures of MOD in this case are due mostly to the dictionary update, which obviously hinders good coding by OMP, which obviously hinders the dictionary update, which clearly hinders good coding, and so on.

Next up is looking at MOD with shift-invariant dictionaries.

Experiments with MOD

| No Comments
Having last week successfully understood MOD, and gained proper respect for the importance of initialization and dealing with evaporating atoms, I ran a set of experiments over the weekend testing MOD with a more difficult problem size. Before, I was having MOD run for 1000 iterations to learn dictionaries of cardinality 5 of length-3 atoms from 10 measurements of 1-sparse signals with OMP as the coding step --- which makes it MP since it only selects the one best atom. We often see after some iterations, the coding error of the dictionary produced by MOD reaches numerical precision. In my new experiments, I am having MOD run for 1000 iterations learning dictionaries of cardinality 128 of length-64 atoms (each sampled from the uniform spherical ensemble) from 256 measurements of 8-sparse signals (non-zero elements distributed Normally), again with OMP as the coding step. (I make sure the collection of 256 sparse signals spans \(\mathcal{R}^{128}\).) But here, I test different stopping conditions for OMP. First, I have OMP select only 8 atoms for coding a measurements. Second, I have OMP select 16 atoms, and from this I select the 8 with the largest magnitude weights, and then orthogonally project onto the span of these. Finally, I have OMP select 24 atoms, from which I select 8 and project in the same. I perform each test for 100 realizations. Below are the results for each realization.

normal_error_typeOMPChol_1000.png On the left plot, we show the mean coding error as a function of MOD iteration for each set of 100 realizations. It is clear some learning is going on in all three cases. On the right, we plot the distribution (bars) of the final coding error at iteration 1000, and its cumulative distribution (lines). It appears that we aid MOD by increasing the number of atoms selected by OMP, and then choose from among them. There are clearly three significantly different distributions. However, we do not see MOD reaches anything close to numerical precision.

Now, I wonder, what happens when the sparse vectors are distributed Bernoulli-Rademacher? Are there not any atoms produced by MOD that match any in the dictionary? What happens with a highly correlated, e.g., shift-invariant, dictionary? And how does K-SVD perform? And how does the Olhausen and Field approach perform?

All of these questions are motivated by the interesting work in B. Mailhé and M. D. Plumbley, "Dictionary Learning with Large Step Gradient Descent for Sparse Representations", Proc 10th International Conference on Latent Variable Analysis and Source Separation (LVA/ICA 2012), Tel-Aviv, Israel, LNCS 7191, pp. 231-238, March 12-15, 2012.
I just came across these two interesting issued patents on concatenative synthesis: Just one word difference in their titles, and the latter patent was filed on the same day as the former one for one year; yet both are granted. That is strange.

Einstein on the Beach

| No Comments
Two tickets please. I just bought mine for the show next week!

Really, if it wasn't for accidentally 13-year-old-me checking out a Philip Glass CD from my small-town library, I probably wouldn't be doing signal processing.
In theory, things can look really good; but then when it comes to implementation, wrinkles arise. Only after some hair pulling did I realize that there may be a step missing in MOD, which I cannot find in the original paper: K. Engan, S. O. Aase, and J. H. Husøy, "Method of optimal directions for frame design", Proc. ICASSP, Vol. 5, pp. 2443 - 2446, 1999.
In K. Engan, S. O. Aase, and J. H. Husøy, "Method of optimal directions for frame design", Proc. ICASSP, Vol. 5, pp. 2443 - 2446, 1999, the authors propose what is essentially a dictionary learning algorithm inspired by the generalized Lloyd algorithm. Given a set of \(L\) measurements \(\mathcal{U} := \{\vu_i \in \mathcal{R}^m\}\), expressed as the columns in the matrix \(\MU\), we want to learn a set of vectors in \(\mathcal{R}^m\) that minimize some distortion. So the generalized Lloyd algorithm proposes the following steps.

  1. Define the number of atoms \(N\) desired, and the order \(s\) to use for each approximation of the elements of \(\mathcal{U}\), and \(i \leftarrow 0\).
  2. Create an initial dictionary of \(N\) atoms, expressed as a matrix \(\MPhi^{(0)}\).
  3. With \(\MPhi^{(i)}\) and any algorithm of your choosing, find an \(s\)-order approximation of each element in \(\mathcal{U}\) such that \(\tilde\vu_i := \MPhi^{(i)}\vx_i\) where \(\vx_i \in \mathcal{R}^N\) contains the \(s\) non-zero approximation weights and zeros in all other elements; define the set of residuals \(\mathcal{E} := \{\vr_i := \vu_i - \tilde\vu_i \}\)
  4. Adjust the dictionary \(\MPhi^{(i+1)} \leftarrow f(\MU,\MPhi^{(i)},\MX)\)
  5. \(i \leftarrow i+1\)
  6. Go to step 3.
While step 4 can be done in many ways, the method of optimal directions (MOD) prescribes a specific approach.
In an attempt to figure out why orthogonal matching pursuit (OMP) performs better than the guarantees that have been found, I am working my way through this illuminating paper: M. D. Plumbley, "On polar polytopes and the recovery of sparse representations", IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3188-3195, Sep. 2007. In particular, I was struck by Theorem 10. But first, some background.

Recall Tropp's exact recovery condition (ERC) here or here, or alternatively here. Suppose we have some matrix \(\MD\) with unit norm columns (for notational convenience), and measurements \(\vx\), and we know \(\vx\) is a linear combination of \(s > 0\) columns of \(\MD\), specifically the ones indexed by \(\Phi\). In other words, \(\vx\) has an \(s\)-term representation. Orthogonal matching pursuit (or basis pursuit) is guaranteed to find these \(s\) columns composing \(\vx\) if $$ \max_{i \in \Psi} ||\MD_\Phi^\dagger \vd_i ||_1 < 1 $$ where \(\MD_\Phi\) is a matrix of the \(s\) columns of \(\MD\) indexed by \(\Phi\), and \(\Psi\) indexes the columns of \(\MD\) that are not participating in \(\vx\), (\(\vd_i\) is the \(i\)th column of \(\MD\)). Note that this means OMP (or BP) can recover any \(s\)-term representation from \(\vx\) only when the above is true for every \(s\)-size subset of columns from \(\MD\). When this is true, OMP (or BP) is a "correct algorithm" for \(s\)-term representations in \(\MD\).

Now Theorem 10 in this paper states that the fact a dictionary satisfies ERC for all \(s\)-term representations is not sufficient to guarantee it satisfies ERC for all representations involving \(k < s\) terms. The proof is constructive. Define some \(2\times 2\) dictionary. Without looking, and without computing, we can say exactly which columns compose the measurements when \(s = 2\). Thus, we have exact recovery; and indeed the ERC above says so since \(\Psi\) is empty. Now define the dictionary $$ \MD = [\vd_1 | \vd_2] := \left [ \begin{array}{cc} 1 & \sqrt{2} \\ 0 & \sqrt{2} \end{array} \right ] $$ and let \(\vs := [1, 0]^T\). Then the ERC becomes $$ | \langle [\sqrt{2}, \; \sqrt{2}]^T, [1, \; 0]^T \rangle | = \sqrt{2} > 1 $$ and thus the ERC fails. However, with OMP (or BP), or good eyeballing, we can always recover a \(\vs\) if \(\vx\) is composed of one column of this \(\MD\). The take-home point of this example is: just because the ERC is satisfied for all measurements involving \(s\) columns of \(\MD\) doesn't mean it is also satisfied for all measurements involving fewer columns.

That is a strange result, and one deserving to be inspected.

Computer Music Judges

| No Comments
Just today on BBC News, a nice story about some of the first computerized music judges. More specific details of this interesting event are here.

Here are the winners.

Scatter is on its way back!

| No Comments
I am all moved into London now, and my first task was to get Scatter up and running again. After several dozen hours of hacking code that was dropped over 3 years ago, "Build succeeded" was music to my ears. Here is its first picture as it awoke from its slumber:

SCATTER01.png It is buggy, and crashing every now and then. But here is what I hope to have recreated very soon:

Recent Comments

  • Jort Gemmeke: Sorry didnt make it through the first 10 without error read more
  • Bob L. Sturm: Thanks for trying! Looks like you need to run it read more
  • Eric W. Tramel: I don't have a local copy of Matlab, so I read more
  • Pardis: I was wondering, if one could translate the following, read more
  • Bob L. Sturm: Even more reason to view with suspicion that these algorithms read more
  • corey: Maybe I have my genres mixed up, but I thought read more
  • corey: The features you are using may be low dimensional, but read more
  • Bob L. Sturm: The spaces in which I am comparing features have dimensions read more
  • corey: The Tzanetakis set contains 100, 30 second clips for each read more
  • Jort Gemmeke: Hi Bob, Off course, you can turn this around and read more

Recent Assets

  • classificationaccuracies_stump_11025.png
  • classificationaccuracies_stump_22050.png
  • normal_error_oracle_closeup.png
  • normal_error_oracle.png
  • normal_error_typeOMPChol_1000.png
  • normal_error_type4_1000.png
  • normal_error_type3_1000.png
  • normal_error_type2_1000.png
  • normal_error_type2.png
  • normal_error_type1.png

Find recent content on the main index or look in the archives to find all content.