# August 2012 Archives

## A Smattering of Papers from EUSIPCO 2012, pt. 4

And now for the last installment.

"Clustering Before Training Large Datasets - Case Study: K-SVD" by C. Rusu
Save computation by preprocessing the training data to reduce its size, and then apply K-SVD to learn a dictionary.
"Binarization of Consensus Partition Matrix for Ensemble Clustering" by B. Abu-Jamous, R. Fa, A. Nandi and D. Roberts
Take a dataset, cluster it in multiple ways, and then combine these results using a consensus procedure. I think that could be very useful for what I am about to do. One relevant reference is A. Weingessel, E. Dimitriadou, and K. Hornik, "An ensemble method for clustering," in DSC 2003 Working Papers, 2003; and H. G. Ayad and M. S. Kamel, "On voting-based consensus of cluster ensembles," Pattern Recognition, vol. 43, pp. 1943-1953, 2010.
"Iterated Sparse Reconstruction for Activity Estimation in Nuclear Spectroscopy" by Y. Sepulcre and T. Trigano
This paper presents an approach for sparse decomposition by applying LARS to solve the LASSO for a given regularization parameter, and then decreasing the parameter. Here they are interested in only estimating the mean number of arrivals per unit time. Also I see I should read T. Zhang, "Adaptive Forward-Backward greedy algo- rithm for learning sparse representations," IEEE Trans- actions on Information Theory, vol. 57, no. 7, pp. 4689- 4708, 2011.
"An Analysis Prior Based Decomposition Method for Audio Signals" by O. Akyildiz and I. Bayram
This paper proposes an approach to decomposing an audio signal into transient and tonal components. ilker makes his code available here. Aside from using the analysis formulation, it is essentially the same as in K. Siedenburg and M. Dörfler, "Structured sparsity for audio signals," in Proc. Int. Conf. Digital Audio Effects (2011). It is also quite close to L. Daudet, "Sparse and Structured Decompositions of Signals with the Molecular Matching Pursuit", IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1808-1816, Sep. 2006. Also quite close are: B. L. Sturm, J. J. Shynk, and S. Gauglitz, "Agglomerative clustering in sparse atomic decompositions of audio signals," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Las Vegas, NV), pp. 97-100, Apr. 2008; and B. L. Sturm, J. J. Shynk, A. McLeran, C. Roads, and L. Daudet, "A comparison of molecular approaches for generating sparse and structured multiresolution representations of audio and music signals," in Proc. Acoustics, (Paris, France), pp. 5775-5780, June 2008.
"Low Complexity Approximate Cyclic Adaptive Matching Pursuit" by A. Onose and B. Dumitrescu
I think this paper presents a sparse approximation algorithm, but it seems so strongly tied to estimating a slowly-varying FIR filter that it might not generalize. The paper cites the original cyclic matching pursuit work by Christensen and Jensen 2007, but does not say how the presented algorithm is different. This is probably stated in: A. Onose and B. Dumitrescu, "Cyclic Adaptive Matching Pursuit," in Proc. ICASSP, Kyoto, Japan, Mar. 2012.
"Audio Source Separation Informed by Redundancy with Greedy Multiscale Decompositions" by M. Moussallam, G. Richard and L. Daudet
The paper presents the "jointly adaptive matching pursuit" to decompose audio mixtures. Is it a generalized version of: R. Gribonval, Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture,'' in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, (Orlando, FL), pp. 3057-3060, May 2002?
"Adaptive Distance Normalization for Real-time Music Tracking" by A. Arzt, G. Widmer and S. Dixon
Combining spectral and onset features, with the appropriate changes to the distance measure, significantly helps music alignment and real-time tracking.
"Assessment of Subjective Audio Quality From EEG Brain Responses Using Time-Space-Frequency Analysis" by C. Creusere, J. Kroger, S. Siddenki, P. Davis and J. Hardin
This idea is very intriguing. Forget asking people about perceptual quality; just have them bring their brains in for testing. The experiments, appear to change the quality of audio by either lowpass filtering, or scaling something, and then EEG measurements are classified. After several passes, I can't understand what is happening; but music by Beethoven and Blondie are involved.
"Catalog-Based Single-Channel Speech-Music Separation with the Itakura-Saito Divergence" by C. Demir, A. T. Cemgil and M. Saraclar
The catalog are those jingles that will interfere with speech signals on, e.g., news channels. This approach can significantly decrease the word error rate for automatic speech recognition.

## A Smattering of Papers from EUSIPCO 2012, pt. 3

So many papers, so little time.

"Music Structure Analysis by Subspace Modeling" by Y. Panagakis and C. Kotropoulos
This paper applies subspace clustering of beat-aligned auditory temporal modulation features to extracting the structure of musical signals. This is an interesting unsupervised method for discovering structure. Subspace clustering is reinvented in R.Vidal, "Subspace clustering," IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52-68, 2011. It is originally proposed in B. V. Gowreesunker and A. H. Tewfik, "A novel subspace clustering method for dictionary design," in Proc. ICA, 2009, vol. 5441, pp. 34-41; B. V. Gowreesunker and A. H. Tewfik, "A shift tolerant dictionary training method," presented at the Signal Processing With Adaptive Sparse Structured Representations (SPARS), Saint Malo, France, 2009, INRIA Rennes--Bretagne Atlantique; and B. V. Gowreesunker and A. H. Tewfik, "Learning sparse representation using iterative subspace identification" IEEE Transactions on Signal Processing, Vol. 58, no. 6, June 2010.
"A Framework for Fingerprint-Based Detection of Repeating Objects in Multimedia Streams" by S. Fenet, M. Moussallam, Y. Grenier, G. Richard and L. Daudet
Take the Shazam fingerprint method, and generate the anchors by a matching pursuit that emphasizes atom diversity instead of peak finding in a spectrogram.
"Evolutionary Feature Generation for Content-based Audio Classification and Retrieval" by T. Mäkinen, S. Kiranyaz, J. Pulkkinen and M. Gabbouj
An approach to optimizing features using particle swarms? I like it already.
"Robust Retina-based Person Authentication Using the Sparse Classifier" by A. Condurache, J. Kotzerke and A. Mertins
Sparse representation classification goes CSI. The paper does not mention the computational approach used to find the sparse representations.
"Gammatone Wavelet Features for Sound Classification in Surveillance Applications" by X. Valero and F. Alías
The paper proposes new features for discriminating between different sounds, such as dogs barking, people talking or screaming, guns shooting, feet stepping, and thunder clapping. This approach employs perceptual motivation for the features. Why handicap a classification or estimation system by human limitations?
"Daily Sound Recognition Using a Combination of GMM and SVM for Home Automation" by M. A. Sehili, D. Istrate, B. Dorizzi and J. Boudy
Somehow GMMs and SVMs are combined with sequence discrimination. I need to read closer since it looks really interesting.
"Enhancing Timbre Model Using MFCC and Its Time Derivatives for Music Similarity Estimation" by F. de Leon and K. Martinez
In past work, MFCC features have often been concatenated with delta MFCC and delta^2MFCC. This paper looks at the effect on classification of treating these separately using bags of frames of features (BFFs). Genre classification of musical signals shows differences between these approaches. But shouldn't a proper scaling of the dimensions work the same?
"Classification of Audio Scenes Using Narrow-Band Autocorrelation Features" by X. Valero and F. Alías
This paper proposes treating separately the bands of a multiband decomposition of music signals. The four low-level features extracted come from the autocorrelation of each separate band. These low-level features are tested in discriminating between acoustic environments (such as classroom and library).
"Large Scale Polyphonic Music Transcription Using Randomized Matrix Decompositions" by I. Ari, U. Simsekli, A. T. Cemgil and L. Akarun
This looks like very fine work employing randomized factorizations that can handle large datasets. The paper points to P.Smaragdis, "Polyphonic Pitch Tracking by Example," in Proc. IEEE WASPAA, pp. 125-128, 2011.
"Searching for Dominant High-Level Features for Music Information Retrieval" by M. Zanoni, D. Ciminieri, A. Sarti and S. Tubaro
This paper attacks the problem of making features more discriminable by clustering, and tests the new methods in the context of genre recognition. Groups of music excerpts associated with features that are near cluster centroids are evaluated by humans in semantic terms, which shows some interesting high-level properties, e.g., music that is "Groovy" or "Classic."
"AM-FM Modulation Features for Music Instrument Signal Analysis and Recognition" by A. Zlatintsi and P. Maragos
This paper describes applying perceptual-based features to identifying musical instruments. Tests on monophonic instrument recordings (is the IOWA dataset of isolated notes, or musical phrases?) give good results.
"Analysis of Speaker Similarity in the Statistical Speech Synthesis Systems Using a Hybrid Approach" by E. Guner, A. Mohammadi and C. Demiroglu
This work will be useful for the iPad game I will create someday.
"A Geometrical Stopping Criterion for the LAR Algorithm" by C. Valdman, M. Campos and J. Apolinário Jr.
The paper applies something called a Volterra filter to determine when to stop LAR. I have heard LARS is related to subspace pursuit, so this paper could provide some interesting ideas.
"Signal Compression Using the Discrete Linear Chirp Transform (DLCT)" by O. Alkishriwo and L. Chaparro
This paper proposes using a chirp transform for audio compression. Essentially, the algorithm estimates chirp parameters and a amplitude for each frame. The authors apply this algorithm to speech and bird sounds, and compare its performance to that of a compressed sensing of the audio. This makes no sense to me. That is like racing a red Ferrari against a red tomato, which makes no sense to me either. The paper says, "[bird song] is sparser in the time domain than in the frequency domain." What?? The figures are useless, but the description claims the chirp transform method is better.

## A Smattering of Papers from EUSIPCO 2012, pt. 2

Continuing with some papers selected from EUSIPCO 2012.

"How to Use Real-Valued Sparse Recovery Algorithms for Complex-Valued Sparse Recovery?" by A. Sharif-Nassab, M. Kharratzadeh, M. Babaie-Zadeh and C. Jutten
This appears to be a very nice practical paper. It shows that, as long as the sparsity is a quarter the spark of the dictionary, one need not solve a second-order cone problem for complex sparse recovery using error constrained $$\ell_1$$-minimization, but instead pose it as a linear program. The paper also corrects a few misstatements in the literature.
"A Greedy Algorithm to Extract Sparsity Degree for l1/l0-equivalence in a Deterministic Context" by N. Pustelnik, C. Dossal, F. Turcu, Y. Berthoumieu and P. Ricoux
This paper recalls the polytope interpretation underlying the work of Donoho and Tanner to find the class of signals that are not recoverable by error constrained $$\ell_1$$-minimization from compressed sampling in a deterministic sensing matrix. This paper definitely deserves a deeper read, and reminds me to return to some work from: M. D. Plumbley, "On polar polytopes and the recovery of sparse representations", IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3188-3195, Sep. 2007.
"Choosing Analysis or Synthesis Recovery for Sparse Reconstruction" by N. Cleju, M. Jafari and M. Plumbley
This paper explores where the analysis and synthesis approaches to sparse recovery are different. We see with more measurements, the synthesis formulation becomes better for sparser signals, and the analysis formulation better for cosparser signals. Furthermore, the analysis formulation is more sensitive to sparse signals that are approximately sparse. This is a nice paper with a strong empirical component.
"CoSaMP and SP for the Cosparse Analysis Model" by R. Giryes and M. Elad
On the heels of the last paper, this one adapts CoSaMP and SP to the Analysis formulation. The paper also presents a nice table summarizing the synthesis and analysis formulations.
"Matching Pursuit with Stochastic Selection" by T. Peel, V. Emiya, L. Ralaivola and S. Anthoine
In order to accelerate matching pursuit, this work takes a random subset of the dictionary as in this work, but also a random subset of the dimensions. Thus, it need not compute full inner products. They show good approximation ability with a smaller computational price.
"Robust Greedy Algorithms for Compressed Sensing" by S. A. Razavi, E. Ollila and V. Koivunen
This paper presents modifications to OMP and CoSaMP wherein M-estimates are used to help guard against the effect of possibly impulsive noise.
"A Fast Algorithm for the Bayesian Adaptive Lasso" by A. Rontogiannis, K. Themelis and K. Koutroumbas
This paper takes the adaptive lasso and makes it faster to apply to recovery from compressive sampling. It appears to do well with measurements in noise, for both Bernoulli-Gaussian and Bernoulli-Rademacher signals.
"Audio Forensics Meets Music Information Retrieval - A Toolbox for Inspection of Music Plagiarism" by C. Dittmar, K. Hildebrand, D. Gaertner, M. Winges, F. Müller and P. Aichroth
A toolbox! For detecting music plagiarism! The authors have assembled a Batman belt of procedures in the context of the REWIND project. It tackles three types of plagiarism: sample, rhythm, and melody.
"Detection and Clustering of Musical Audio Parts Using Fisher Linear Semi-Discriminant Analysis" by T. Giannakopoulos and S. Petridis
This paper presents an approach to segmenting a musical signal using bags of frames of features (BFFs), and then Fisher linear discriminant analysis and clustering to find sections that are highly contrasting in relevant subspaces.
"Forward-Backward Search for Compressed Sensing Signal Recovery" by N. B. Karahanoglu and H. Erdogan
The idea here is nice. Expand your support set by some number of elements, and then reduce it by that number less one. The expansion is done by selecting the elements with the largest correlation with the residual; the shrinkage is done by removing the elements with the smallest correlation with the new residual. The experiments are run at a small problem size with increasing sparsity, and we see this algorithm performs favorably compared to OMP and SP --- though "exact reconstruction rate" is never defined. It would be interesting to see simulations using Bernoulli-Rademacher signals. One relevant publication missing from the references is: M. Andrle and L. Rebollo-Neira, "Improvement of Orthogonal Matching Pursuit Strategies by Backward and Forward Movements", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 313-316, Toulouse, France, Apr. 2006. In that work, however, they apply this forward-backward business at the end of the pursuit.
"Fusion of Greedy Pursuits for Compressed Sensing Signal Reconstruction" by S. K. Ambat, S. Chatterjee and K. Hari
The idea is simple yet effective. Take two greedy pursuits, run them both, and combine their results to find one that is better. The experiments show favorable robustness to the distribution underlying sparse signals, as well as to noise. The cost is, of course, increased computation; but if it works, it works. I had a similar idea a while ago, but reviewers didn't like it. I remember one review said it was "too obvious." This fusion framework provides a nice way to get around the problem of selecting the best support of a single solver.
"Use of Tight Frames for Optimized Compressed Sensing" by E. Tsiligianni, L. Kondi and A. Katsaggelos
This paper adapts an approach by Elad for building Grassmannian sensing matrices for compressed sensing, and shows their performance is better than random sensing matrices with respect to mean squared reconstruction error.
"A Comparison of Termination Criteria for A*OMP" by N. B. Karahanoglu and H. Erdogan
"Classification From Compressive Representations of Data" by B. Coppa, R. Héliot, D. David and O. Michel
To what extent does compressive sampling hurt discriminatability? This paper experiments with it in fundamental ways to clearly show more measurements leads to fewer errors.

## Paper of the Day (Po'D): My EUSIPCO Papers Day 2 Edition

Tomorrow at EUSIPCO 2012, I present my third paper. And to remind myself of what I did, I present a Po'D: B. L. Sturm, "When 'exact recovery' is exact recovery in compressed sensing simulation," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012.

This paper is essentially described in this post, with a link to a failed submission that confused both reviewers. My EUSIPCO submission also confused a reviewer, but this time four other reviewers were not confused --- so I made some progress! Nonetheless, I made considerable changes for the camera ready paper to make its purpose even more clear, and to make as sure as I could that every claim is correct. I also changed the title from "When 'exact recovery' is exact recovery in compressive sampling" --- adding the "simulation" bit to make the following more clear. This paper is entirely a practical paper. It has not even an ounce of CS theory, and so it does not make any sort of contribution to CS theory. It is the first of a few papers I am writing about the effects of the decisions we make in simulating algorithms for recovering signals from compressive sampling. This interest grows out of reading many papers with differing experimental designs, not to mention my empirical work.

In simulating a CS recovery algorithm over the phase plane, we have to make a choice of when a signal is recovered. Of course, the definition of exact recovery should come from the application; but in these times of prolific algorithm design, we need to just simulate and compare. In this paper, I look closely at two common definitions that have been used: one based on support, and another based on normalized squared error. I seek to answer when they are the same, and what is the meaning of the parameter on the latter. I look at both noisy and noiseless conditions, making particular assumptions on the signals (which I call the "best case scenario"), and answer these questions in ways that show how these two exact recovery criteria are related, and how to set the parameter of one of them.

## A Smattering of Papers from EUSIPCO 2012, pt. 1

Hello, and welcome to brief presentations of some papers that are somehow relevant to my current research interests. Since there are so many interesting papers, I only take a cursory look over a few and jot down some notes --- which are probably inaccurate but might help me later when I need to find something I read.

"An Ellipsoid-Based, Two-Stage Screening Test for BPDN" by L. Dai and K. Pelckmans
From the title I first thought that BPDN was some malady; but it is the familiar "basis pursuit denoising" algorithm. Essentially, this paper presents an interesting way to reduce the computational cost of BPDN by performing a "screening" first to find elements that are highly likely to be zero. Previous work in this area come from: Z. J. Xiang, H. Xu, P. J. Ramadge, "Learning sparse representations of high dimensional data on large scale dictionaries," NIPS 2011; Z. J. Xiang, P. J. Ramadge, "Fast LASSO screening tests based on correlations," ICASSP 2012; and L.E. Ghaoui, V. Viallon, T. Rabbani, "Safe feature elimination in sparse supervised learning," Arxiv preprint arXiv:1009.3515, 2010. The figures in this paper are useless, so I can't really conclude anything without more closely reading the paper. :) This is a good opportunity for a public service announcement: Please, sympathize with readers who really want to understand your work: create figures that advertize and not antagonize.
"Online One-Class Machines Based on the Coherence Criterion," by Z. Noumir, P. Honeine and C. Richard
This is the first time I have heard of one-class classification. My first thought: what's the use? But then I thought, well, that is just detection. It is or it isn't. But after reading some more, I see it is the more deep problem of finding a way to detect an apple while only knowing apples and not other fruit. This paper builds a method for online learning of a one-class SVM using the coherence to limit the number of support vectors. It appears that elements will only be added to the dictionary of support vectors if they are all incoherent, which is simply to say they point in directions that are relatively orthogonal. It is impressive to see that the learning is two orders of magnitude faster than using the one-class SVM. Good work!
"Multi-sensor Joint Kernel Sparse Representation for Personnel Detection", by N. Nguyen, N. Nasrabadi and T. D. Tran
This paper appears to reinvent kernel sparse representation: P. Vincent and Y. Bengio, "Kernel matching pursuit", Machine Learning, vol. 48, no. 1, pp. 165-187, July 2002. It does extend this approach to joint sparse representation, and applies it to an interesting problem.

## Papers of the Day (Po'D): My EUSIPCO Papers Day 1 Edition

Tomorrow at EUSIPCO 2012 I present two papers. And to remind myself of what I did, I present some Po'D.

The first paper I present is as an oral presentation: B. L. Sturm and M. G. Christensen, "Comparison of Orthogonal Matching Pursuit Implementations," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012. And lucky me, I have already posted a Po'D here. The paper and presentation slides are available at the link above. (Today I decided that in the presentation I will skip the details of the implementations and go from the big picture to the results.)

The second paper I present is as a poster: P. Noorzad and B. L. Sturm, "Regression with sparse approximations of data," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012.

In this paper, we adapt sparse representation classification for use in regression, and specifically, local polynomial regression. Local regression is a non-parametric approach that emphasizes the importance of locally rather than globally fitting a surface to the regression function. The Taylor expansion facilitates local polynomial regression, but it requires the estimation of several parameters depending on the polynomial order (e.g., for locally linear we must estimate the gradient, and for locally quadratic the gradient and Hessian). This estimation problem is quite easy when using least squares optimization. However, we must define the contribution of each regressand to the local fit, which are called "observation weights." One approach for defining observation weights is to use the Euclidean distances between the k closest regressors and the point of interest. That gives "weighted k-nearest neighbor regression". Another approach is use a kernel to weight all regressands. That gives Nadaraya-Watson regression. In our approach --- sparse approximation weighted regression (SPARROW) --- we define these weights for a point of interest by its sparse approximation using the dictionary of normalized regressors. In our experiments, we find that the locally constant version of SPARROW can perform competitively with respect to other locally constant regression methods. Curiously, the locally linear version of SPARROW performs more poorly. Why does this happen? We don't yet know.

## Off to EUSIPCO 2012

After an extremely busy (paper deadlines, grant writing, job applications and interviews, class preparation) and entertaining (vacations in London and Jutland, the Olympics, collecting everything Disco) summer, I am off tomorrow to attend EUSIPCO 2012 in the lovely city of Bucharest in Romania (which will apparently be very warm).

This year I am presenting three papers:
I will provide Po'D for each of these during next week, as well as interesting things I find along the way.

## The Impact of Font

Here is a fantastic article on the impact of font on perception. I will now be writing all my grant applications using "Georgia."

Bob L. Sturm, Associate Professor
Audio Analysis Lab
Aalborg University Copenhagen
A.C. Meyers Vænge 15
DK-2450 Copenahgen SV, Denmark
Email: bst_at_create.aau.dk