The paper, R. B. Dannenberg, B. Thom, and D. Watson, "A machine learning approach to musical style recognition," in Proc. International Computer Music Conf., Thessaloniki, Greece, Sep. 1997, is regarded as the first to explore something like recognizing the genre of a musical signal. It proposes a system to determine the playing style of a musician. However, I have just discovered the following fascinating paper: K.-P. Han, Y.-S. Park, S.-G. Jeon, G.-C. Lee, and Y.-H. Ha, "Genre classification system of TV sound signals based on a spectrogram analysis," IEEE Transactions on Consumer Electronics, vol. 44, pp. 33-42, Feb. 1998. In that paper, they look at discriminating between speech and music, and Jazz, Classical and Popular genres. Not only do they simulate the algorithm, they actually implement the system using circuits and show the results. They also list the musical pieces they put in each genre dataset. Was Kansas Popular in 1998?

# Recently in *Signal Processing* Category

From: J. G. A. Barbedo and A. Lopes, "Automatic genre classification of musical signals," EURASIP Journal on Advances in Signal Processing, 2007.

The authors specify the meaning of each of these labels. For instance, "Dance" music has "strong percussive elements and very marked beating." Stemming from "Dance" there is "Jazz", "characterized by the predominance of instruments like piano and saxophone. Electric guitars and drums can also be present; vocals, when present, are very characteristic." And stemming from "Dance," stemming from "Jazz," there is "Cool", a "jazz style [that is] light and introspective, with a very slow rhythm." The genres "Techno" and "Disco" --- which both emphasize the importance of listening with your body and feet --- do not stem from "Dance," but instead from "Pop/Rock," "the largest class, including a wide variety of songs."

Props to the authors for attempting the impossible, but any taxonomy of music genre must be broken from the very first stem. Genres are not like species, and cannot be arranged like so. (On the plus side, it appears that to differentiate introspective music from non-introspective music requires only four spectral features computed over 21.3 ms windows.)

The authors specify the meaning of each of these labels. For instance, "Dance" music has "strong percussive elements and very marked beating." Stemming from "Dance" there is "Jazz", "characterized by the predominance of instruments like piano and saxophone. Electric guitars and drums can also be present; vocals, when present, are very characteristic." And stemming from "Dance," stemming from "Jazz," there is "Cool", a "jazz style [that is] light and introspective, with a very slow rhythm." The genres "Techno" and "Disco" --- which both emphasize the importance of listening with your body and feet --- do not stem from "Dance," but instead from "Pop/Rock," "the largest class, including a wide variety of songs."

Props to the authors for attempting the impossible, but any taxonomy of music genre must be broken from the very first stem. Genres are not like species, and cannot be arranged like so. (On the plus side, it appears that to differentiate introspective music from non-introspective music requires only four spectral features computed over 21.3 ms windows.)

And now for the last installment.

"Clustering Before Training Large Datasets - Case Study: K-SVD" by C. Rusu

"Clustering Before Training Large Datasets - Case Study: K-SVD" by C. Rusu

Save computation by preprocessing the training data to reduce its size, and then apply K-SVD to learn a dictionary."Binarization of Consensus Partition Matrix for Ensemble Clustering" by B. Abu-Jamous, R. Fa, A. Nandi and D. Roberts

Take a dataset, cluster it in multiple ways, and then combine these results using a consensus procedure. I think that could be very useful for what I am about to do. One relevant reference is A. Weingessel, E. Dimitriadou, and K. Hornik, "An ensemble method for clustering," in DSC 2003 Working Papers, 2003; and H. G. Ayad and M. S. Kamel, "On voting-based consensus of cluster ensembles," Pattern Recognition, vol. 43, pp. 1943-1953, 2010."Iterated Sparse Reconstruction for Activity Estimation in Nuclear Spectroscopy" by Y. Sepulcre and T. Trigano

This paper presents an approach for sparse decomposition by applying LARS to solve the LASSO for a given regularization parameter, and then decreasing the parameter. Here they are interested in only estimating the mean number of arrivals per unit time. Also I see I should read T. Zhang, "Adaptive Forward-Backward greedy algo- rithm for learning sparse representations," IEEE Trans- actions on Information Theory, vol. 57, no. 7, pp. 4689- 4708, 2011."An Analysis Prior Based Decomposition Method for Audio Signals" by O. Akyildiz and I. Bayram

This paper proposes an approach to decomposing an audio signal into transient and tonal components. ilker makes his code available here. Aside from using the analysis formulation, it is essentially the same as in K. Siedenburg and M. Dörfler, "Structured sparsity for audio signals," in Proc. Int. Conf. Digital Audio Effects (2011). It is also quite close to L. Daudet, "Sparse and Structured Decompositions of Signals with the Molecular Matching Pursuit", IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1808-1816, Sep. 2006. Also quite close are: B. L. Sturm, J. J. Shynk, and S. Gauglitz, "Agglomerative clustering in sparse atomic decompositions of audio signals," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Las Vegas, NV), pp. 97-100, Apr. 2008; and B. L. Sturm, J. J. Shynk, A. McLeran, C. Roads, and L. Daudet, "A comparison of molecular approaches for generating sparse and structured multiresolution representations of audio and music signals," in Proc. Acoustics, (Paris, France), pp. 5775-5780, June 2008."Low Complexity Approximate Cyclic Adaptive Matching Pursuit" by A. Onose and B. Dumitrescu

I think this paper presents a sparse approximation algorithm, but it seems so strongly tied to estimating a slowly-varying FIR filter that it might not generalize. The paper cites the original cyclic matching pursuit work by Christensen and Jensen 2007, but does not say how the presented algorithm is different. This is probably stated in: A. Onose and B. Dumitrescu, "Cyclic Adaptive Matching Pursuit," in Proc. ICASSP, Kyoto, Japan, Mar. 2012."Audio Source Separation Informed by Redundancy with Greedy Multiscale Decompositions" by M. Moussallam, G. Richard and L. Daudet

The paper presents the "jointly adaptive matching pursuit" to decompose audio mixtures. Is it a generalized version of: R. Gribonval, ``Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture,'' in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, (Orlando, FL), pp. 3057-3060, May 2002?"Adaptive Distance Normalization for Real-time Music Tracking" by A. Arzt, G. Widmer and S. Dixon

Combining spectral and onset features, with the appropriate changes to the distance measure, significantly helps music alignment and real-time tracking."Assessment of Subjective Audio Quality From EEG Brain Responses Using Time-Space-Frequency Analysis" by C. Creusere, J. Kroger, S. Siddenki, P. Davis and J. Hardin

This idea is very intriguing. Forget asking people about perceptual quality; just have them bring their brains in for testing. The experiments, appear to change the quality of audio by either lowpass filtering, or scaling something, and then EEG measurements are classified. After several passes, I can't understand what is happening; but music by Beethoven and Blondie are involved."Catalog-Based Single-Channel Speech-Music Separation with the Itakura-Saito Divergence" by C. Demir, A. T. Cemgil and M. Saraclar

The catalog are those jingles that will interfere with speech signals on, e.g., news channels. This approach can significantly decrease the word error rate for automatic speech recognition.

So many papers, so little time.

"Music Structure Analysis by Subspace Modeling" by Y. Panagakis and C. Kotropoulos

"Music Structure Analysis by Subspace Modeling" by Y. Panagakis and C. Kotropoulos

This paper applies subspace clustering of beat-aligned auditory temporal modulation features to extracting the structure of musical signals. This is an interesting unsupervised method for discovering structure. Subspace clustering is reinvented in R.Vidal, "Subspace clustering," IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52-68, 2011. It is originally proposed in B. V. Gowreesunker and A. H. Tewfik, "A novel subspace clustering method for dictionary design," in Proc. ICA, 2009, vol. 5441, pp. 34-41; B. V. Gowreesunker and A. H. Tewfik, "A shift tolerant dictionary training method," presented at the Signal Processing With Adaptive Sparse Structured Representations (SPARS), Saint Malo, France, 2009, INRIA Rennes--Bretagne Atlantique; and B. V. Gowreesunker and A. H. Tewfik, "Learning sparse representation using iterative subspace identification" IEEE Transactions on Signal Processing, Vol. 58, no. 6, June 2010."A Framework for Fingerprint-Based Detection of Repeating Objects in Multimedia Streams" by S. Fenet, M. Moussallam, Y. Grenier, G. Richard and L. Daudet

Take the Shazam fingerprint method, and generate the anchors by a matching pursuit that emphasizes atom diversity instead of peak finding in a spectrogram."Evolutionary Feature Generation for Content-based Audio Classification and Retrieval" by T. Mäkinen, S. Kiranyaz, J. Pulkkinen and M. Gabbouj

An approach to optimizing features using particle swarms? I like it already."Robust Retina-based Person Authentication Using the Sparse Classifier" by A. Condurache, J. Kotzerke and A. Mertins

Sparse representation classification goes CSI. The paper does not mention the computational approach used to find the sparse representations."Gammatone Wavelet Features for Sound Classification in Surveillance Applications" by X. Valero and F. Alías

The paper proposes new features for discriminating between different sounds, such as dogs barking, people talking or screaming, guns shooting, feet stepping, and thunder clapping. This approach employs perceptual motivation for the features. Why handicap a classification or estimation system by human limitations?"Daily Sound Recognition Using a Combination of GMM and SVM for Home Automation" by M. A. Sehili, D. Istrate, B. Dorizzi and J. Boudy

Somehow GMMs and SVMs are combined with sequence discrimination. I need to read closer since it looks really interesting."Enhancing Timbre Model Using MFCC and Its Time Derivatives for Music Similarity Estimation" by F. de Leon and K. Martinez

In past work, MFCC features have often been concatenated with delta MFCC and delta^2MFCC. This paper looks at the effect on classification of treating these separately using bags of frames of features (BFFs). Genre classification of musical signals shows differences between these approaches. But shouldn't a proper scaling of the dimensions work the same?"Classification of Audio Scenes Using Narrow-Band Autocorrelation Features" by X. Valero and F. Alías

This paper proposes treating separately the bands of a multiband decomposition of music signals. The four low-level features extracted come from the autocorrelation of each separate band. These low-level features are tested in discriminating between acoustic environments (such as classroom and library)."Large Scale Polyphonic Music Transcription Using Randomized Matrix Decompositions" by I. Ari, U. Simsekli, A. T. Cemgil and L. Akarun

This looks like very fine work employing randomized factorizations that can handle large datasets. The paper points to P.Smaragdis, "Polyphonic Pitch Tracking by Example," in Proc. IEEE WASPAA, pp. 125-128, 2011."Searching for Dominant High-Level Features for Music Information Retrieval" by M. Zanoni, D. Ciminieri, A. Sarti and S. Tubaro

This paper attacks the problem of making features more discriminable by clustering, and tests the new methods in the context of genre recognition. Groups of music excerpts associated with features that are near cluster centroids are evaluated by humans in semantic terms, which shows some interesting high-level properties, e.g., music that is "Groovy" or "Classic.""AM-FM Modulation Features for Music Instrument Signal Analysis and Recognition" by A. Zlatintsi and P. Maragos

This paper describes applying perceptual-based features to identifying musical instruments. Tests on monophonic instrument recordings (is the IOWA dataset of isolated notes, or musical phrases?) give good results."Analysis of Speaker Similarity in the Statistical Speech Synthesis Systems Using a Hybrid Approach" by E. Guner, A. Mohammadi and C. Demiroglu

This work will be useful for the iPad game I will create someday."A Geometrical Stopping Criterion for the LAR Algorithm" by C. Valdman, M. Campos and J. Apolinário Jr.

The paper applies something called a Volterra filter to determine when to stop LAR. I have heard LARS is related to subspace pursuit, so this paper could provide some interesting ideas."Signal Compression Using the Discrete Linear Chirp Transform (DLCT)" by O. Alkishriwo and L. Chaparro

This paper proposes using a chirp transform for audio compression. Essentially, the algorithm estimates chirp parameters and a amplitude for each frame. The authors apply this algorithm to speech and bird sounds, and compare its performance to that of a compressed sensing of the audio. This makes no sense to me. That is like racing a red Ferrari against a red tomato, which makes no sense to me either. The paper says, "[bird song] is sparser in the time domain than in the frequency domain." What?? The figures are useless, but the description claims the chirp transform method is better.

Continuing with some papers selected from EUSIPCO 2012.

"How to Use Real-Valued Sparse Recovery Algorithms for Complex-Valued Sparse Recovery?" by A. Sharif-Nassab, M. Kharratzadeh, M. Babaie-Zadeh and C. Jutten

"How to Use Real-Valued Sparse Recovery Algorithms for Complex-Valued Sparse Recovery?" by A. Sharif-Nassab, M. Kharratzadeh, M. Babaie-Zadeh and C. Jutten

This appears to be a very nice practical paper. It shows that, as long as the sparsity is a quarter the spark of the dictionary, one need not solve a second-order cone problem for complex sparse recovery using error constrained \(\ell_1\)-minimization, but instead pose it as a linear program. The paper also corrects a few misstatements in the literature."A Greedy Algorithm to Extract Sparsity Degree for l1/l0-equivalence in a Deterministic Context" by N. Pustelnik, C. Dossal, F. Turcu, Y. Berthoumieu and P. Ricoux

This paper recalls the polytope interpretation underlying the work of Donoho and Tanner to find the class of signals that are not recoverable by error constrained \(\ell_1\)-minimization from compressed sampling in a deterministic sensing matrix. This paper definitely deserves a deeper read, and reminds me to return to some work from: M. D. Plumbley, "On polar polytopes and the recovery of sparse representations", IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3188-3195, Sep. 2007."Choosing Analysis or Synthesis Recovery for Sparse Reconstruction" by N. Cleju, M. Jafari and M. Plumbley

This paper explores where the analysis and synthesis approaches to sparse recovery are different. We see with more measurements, the synthesis formulation becomes better for sparser signals, and the analysis formulation better for cosparser signals. Furthermore, the analysis formulation is more sensitive to sparse signals that are approximately sparse. This is a nice paper with a strong empirical component."CoSaMP and SP for the Cosparse Analysis Model" by R. Giryes and M. Elad

On the heels of the last paper, this one adapts CoSaMP and SP to the Analysis formulation. The paper also presents a nice table summarizing the synthesis and analysis formulations."Matching Pursuit with Stochastic Selection" by T. Peel, V. Emiya, L. Ralaivola and S. Anthoine

In order to accelerate matching pursuit, this work takes a random subset of the dictionary as in this work, but also a random subset of the dimensions. Thus, it need not compute full inner products. They show good approximation ability with a smaller computational price."Robust Greedy Algorithms for Compressed Sensing" by S. A. Razavi, E. Ollila and V. Koivunen

This paper presents modifications to OMP and CoSaMP wherein M-estimates are used to help guard against the effect of possibly impulsive noise."A Fast Algorithm for the Bayesian Adaptive Lasso" by A. Rontogiannis, K. Themelis and K. Koutroumbas

This paper takes the adaptive lasso and makes it faster to apply to recovery from compressive sampling. It appears to do well with measurements in noise, for both Bernoulli-Gaussian and Bernoulli-Rademacher signals."Audio Forensics Meets Music Information Retrieval - A Toolbox for Inspection of Music Plagiarism" by C. Dittmar, K. Hildebrand, D. Gaertner, M. Winges, F. Müller and P. Aichroth

A toolbox! For detecting music plagiarism! The authors have assembled a Batman belt of procedures in the context of the REWIND project. It tackles three types of plagiarism: sample, rhythm, and melody."Detection and Clustering of Musical Audio Parts Using Fisher Linear Semi-Discriminant Analysis" by T. Giannakopoulos and S. Petridis

This paper presents an approach to segmenting a musical signal using bags of frames of features (BFFs), and then Fisher linear discriminant analysis and clustering to find sections that are highly contrasting in relevant subspaces."Forward-Backward Search for Compressed Sensing Signal Recovery" by N. B. Karahanoglu and H. Erdogan

The idea here is nice. Expand your support set by some number of elements, and then reduce it by that number less one. The expansion is done by selecting the elements with the largest correlation with the residual; the shrinkage is done by removing the elements with the smallest correlation with the new residual. The experiments are run at a small problem size with increasing sparsity, and we see this algorithm performs favorably compared to OMP and SP --- though "exact reconstruction rate" is never defined. It would be interesting to see simulations using Bernoulli-Rademacher signals. One relevant publication missing from the references is: M. Andrle and L. Rebollo-Neira, "Improvement of Orthogonal Matching Pursuit Strategies by Backward and Forward Movements", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 313-316, Toulouse, France, Apr. 2006. In that work, however, they apply this forward-backward business at the end of the pursuit."Fusion of Greedy Pursuits for Compressed Sensing Signal Reconstruction" by S. K. Ambat, S. Chatterjee and K. Hari

The idea is simple yet effective. Take two greedy pursuits, run them both, and combine their results to find one that is better. The experiments show favorable robustness to the distribution underlying sparse signals, as well as to noise. The cost is, of course, increased computation; but if it works, it works. I had a similar idea a while ago, but reviewers didn't like it. I remember one review said it was "too obvious." This fusion framework provides a nice way to get around the problem of selecting the best support of a single solver."Use of Tight Frames for Optimized Compressed Sensing" by E. Tsiligianni, L. Kondi and A. Katsaggelos

This paper adapts an approach by Elad for building Grassmannian sensing matrices for compressed sensing, and shows their performance is better than random sensing matrices with respect to mean squared reconstruction error."A Comparison of Termination Criteria for A*OMP" by N. B. Karahanoglu and H. Erdogan

I need to read about this A*OMP. It has been on my to do list for a long time."Classification From Compressive Representations of Data" by B. Coppa, R. Héliot, D. David and O. Michel

To what extent does compressive sampling hurt discriminatability? This paper experiments with it in fundamental ways to clearly show more measurements leads to fewer errors.

Hello, and welcome to brief presentations of some papers that are somehow relevant to my current research interests. Since there are so many interesting papers, I only take a cursory look over a few and jot down some notes --- which are probably inaccurate but might help me later when I need to find something I read.

"An Ellipsoid-Based, Two-Stage Screening Test for BPDN" by L. Dai and K. Pelckmans

"An Ellipsoid-Based, Two-Stage Screening Test for BPDN" by L. Dai and K. Pelckmans

From the title I first thought that BPDN was some malady; but it is the familiar "basis pursuit denoising" algorithm. Essentially, this paper presents an interesting way to reduce the computational cost of BPDN by performing a "screening" first to find elements that are highly likely to be zero. Previous work in this area come from: Z. J. Xiang, H. Xu, P. J. Ramadge, "Learning sparse representations of high dimensional data on large scale dictionaries," NIPS 2011; Z. J. Xiang, P. J. Ramadge, "Fast LASSO screening tests based on correlations," ICASSP 2012; and L.E. Ghaoui, V. Viallon, T. Rabbani, "Safe feature elimination in sparse supervised learning," Arxiv preprint arXiv:1009.3515, 2010. The figures in this paper are useless, so I can't really conclude anything without more closely reading the paper. :) This is a good opportunity for a public service announcement: Please, sympathize with readers who really want to understand your work: create figures that advertize and not antagonize."Online One-Class Machines Based on the Coherence Criterion," by Z. Noumir, P. Honeine and C. Richard

This is the first time I have heard of one-class classification. My first thought: what's the use? But then I thought, well, that is just detection. It is or it isn't. But after reading some more, I see it is the more deep problem of finding a way to detect an apple while only knowing apples and not other fruit. This paper builds a method for online learning of a one-class SVM using the coherence to limit the number of support vectors. It appears that elements will only be added to the dictionary of support vectors if they are all incoherent, which is simply to say they point in directions that are relatively orthogonal. It is impressive to see that the learning is two orders of magnitude faster than using the one-class SVM. Good work!"Multi-sensor Joint Kernel Sparse Representation for Personnel Detection", by N. Nguyen, N. Nasrabadi and T. D. Tran

This paper appears to reinvent kernel sparse representation: P. Vincent and Y. Bengio, "Kernel matching pursuit", Machine Learning, vol. 48, no. 1, pp. 165-187, July 2002. It does extend this approach to joint sparse representation, and applies it to an interesting problem.

After an extremely busy (paper deadlines, grant writing, job applications and interviews, class preparation) and entertaining (vacations in London and Jutland, the Olympics, collecting everything Disco) summer, I am off tomorrow to attend EUSIPCO 2012 in the lovely city of Bucharest in Romania (which will apparently be very warm).

This year I am presenting three papers:

This year I am presenting three papers:

- B. L. Sturm, "When 'exact recovery' is exact recovery in compressed sensing simulation," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012.
- B. L. Sturm and M. G. Christensen, "Comparison of Orthogonal Matching Pursuit Implementations," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012.
- P. Noorzad and B. L. Sturm, "Regression with sparse approximations of data," Proc. European Signal Processing Conference, Bucharest, Romania, Aug. 2012.

This looks like a really rewarding thing to solve, but after listening to the sounds myself while viewing the labels, I am not sure it is so solvable with audio features alone. Still, I might try a little something to see what happens.

Hello, and welcome to Paper of the Day (Po'D): Semantic gap?? Schemantic Schmap!! Edition. Today's paper provides an interesting argument for what is necessary to push forward the field of "Music Information Retrieval": G. A. Wiggins, "Semantic gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music," Proc. IEEE Int. Symp. Mulitmedia, pp. 477-482, San Diego, CA, Dec. 2009.

My one line summary of this work is:

My one line summary of this work is:

The sampled audio signal is only half of half of half of the story.

Continue reading Paper of the Day (Po'D): Semantic gap?? Schemantic Schmap!! Edition.

I just came across these two interesting issued patents on concatenative synthesis:

- S. Basu, I. Simon, D. Salesin, M. Agrawala, A. Sherwani, and C. Gibson, "Creating Music via Concatenative Synthesis," U.S. Patent 7,737,354, filed June 15, 2006, and issued June 15, 2010.
- T. Jehan, "Creating Music by Concatenative Synthesis," U.S. Patent 7,842,874, filed June 15, 2007, and issued November 30, 2010.

Bob L. Sturm, Associate Professor

Audio Analysis Lab

Aalborg University Copenhagen

A.C. Meyers Vænge 15

DK-2450 Copenahgen SV, Denmark

Email: bst_at_create.aau.dk