Recently in Signal Processing Category

The paper, R. B. Dannenberg, B. Thom, and D. Watson, "A machine learning approach to musical style recognition," in Proc. International Computer Music Conf., Thessaloniki, Greece, Sep. 1997, is regarded as the first to explore something like recognizing the genre of a musical signal. It proposes a system to determine the playing style of a musician. However, I have just discovered the following fascinating paper: K.-P. Han, Y.-S. Park, S.-G. Jeon, G.-C. Lee, and Y.-H. Ha, "Genre classification system of TV sound signals based on a spectrogram analysis," IEEE Transactions on Consumer Electronics, vol. 44, pp. 33-42, Feb. 1998. In that paper, they look at discriminating between speech and music, and Jazz, Classical and Popular genres. Not only do they simulate the algorithm, they actually implement the system using circuits and show the results. They also list the musical pieces they put in each genre dataset. Was Kansas Popular in 1998?

Music genre taxonomy

| No Comments
genretax.png From: J. G. A. Barbedo and A. Lopes, "Automatic genre classification of musical signals," EURASIP Journal on Advances in Signal Processing, 2007.

The authors specify the meaning of each of these labels. For instance, "Dance" music has "strong percussive elements and very marked beating." Stemming from "Dance" there is "Jazz", "characterized by the predominance of instruments like piano and saxophone. Electric guitars and drums can also be present; vocals, when present, are very characteristic." And stemming from "Dance," stemming from "Jazz," there is "Cool", a "jazz style [that is] light and introspective, with a very slow rhythm." The genres "Techno" and "Disco" --- which both emphasize the importance of listening with your body and feet --- do not stem from "Dance," but instead from "Pop/Rock," "the largest class, including a wide variety of songs."

Props to the authors for attempting the impossible, but any taxonomy of music genre must be broken from the very first stem. Genres are not like species, and cannot be arranged like so. (On the plus side, it appears that to differentiate introspective music from non-introspective music requires only four spectral features computed over 21.3 ms windows.)
And now for the last installment.

"Clustering Before Training Large Datasets - Case Study: K-SVD" by C. Rusu
Save computation by preprocessing the training data to reduce its size, and then apply K-SVD to learn a dictionary.
"Binarization of Consensus Partition Matrix for Ensemble Clustering" by B. Abu-Jamous, R. Fa, A. Nandi and D. Roberts
Take a dataset, cluster it in multiple ways, and then combine these results using a consensus procedure. I think that could be very useful for what I am about to do. One relevant reference is A. Weingessel, E. Dimitriadou, and K. Hornik, "An ensemble method for clustering," in DSC 2003 Working Papers, 2003; and H. G. Ayad and M. S. Kamel, "On voting-based consensus of cluster ensembles," Pattern Recognition, vol. 43, pp. 1943-1953, 2010.
"Iterated Sparse Reconstruction for Activity Estimation in Nuclear Spectroscopy" by Y. Sepulcre and T. Trigano
This paper presents an approach for sparse decomposition by applying LARS to solve the LASSO for a given regularization parameter, and then decreasing the parameter. Here they are interested in only estimating the mean number of arrivals per unit time. Also I see I should read T. Zhang, "Adaptive Forward-Backward greedy algo- rithm for learning sparse representations," IEEE Trans- actions on Information Theory, vol. 57, no. 7, pp. 4689- 4708, 2011.
"An Analysis Prior Based Decomposition Method for Audio Signals" by O. Akyildiz and I. Bayram
This paper proposes an approach to decomposing an audio signal into transient and tonal components. ilker makes his code available here. Aside from using the analysis formulation, it is essentially the same as in K. Siedenburg and M. Dörfler, "Structured sparsity for audio signals," in Proc. Int. Conf. Digital Audio Effects (2011). It is also quite close to L. Daudet, "Sparse and Structured Decompositions of Signals with the Molecular Matching Pursuit", IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1808-1816, Sep. 2006. Also quite close are: B. L. Sturm, J. J. Shynk, and S. Gauglitz, "Agglomerative clustering in sparse atomic decompositions of audio signals," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Las Vegas, NV), pp. 97-100, Apr. 2008; and B. L. Sturm, J. J. Shynk, A. McLeran, C. Roads, and L. Daudet, "A comparison of molecular approaches for generating sparse and structured multiresolution representations of audio and music signals," in Proc. Acoustics, (Paris, France), pp. 5775-5780, June 2008.
"Low Complexity Approximate Cyclic Adaptive Matching Pursuit" by A. Onose and B. Dumitrescu
I think this paper presents a sparse approximation algorithm, but it seems so strongly tied to estimating a slowly-varying FIR filter that it might not generalize. The paper cites the original cyclic matching pursuit work by Christensen and Jensen 2007, but does not say how the presented algorithm is different. This is probably stated in: A. Onose and B. Dumitrescu, "Cyclic Adaptive Matching Pursuit," in Proc. ICASSP, Kyoto, Japan, Mar. 2012.
"Audio Source Separation Informed by Redundancy with Greedy Multiscale Decompositions" by M. Moussallam, G. Richard and L. Daudet
The paper presents the "jointly adaptive matching pursuit" to decompose audio mixtures. Is it a generalized version of: R. Gribonval, ``Sparse decomposition of stereo signals with matching pursuit and application to blind separation of more than two sources from a stereo mixture,'' in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, (Orlando, FL), pp. 3057-3060, May 2002?
"Adaptive Distance Normalization for Real-time Music Tracking" by A. Arzt, G. Widmer and S. Dixon
Combining spectral and onset features, with the appropriate changes to the distance measure, significantly helps music alignment and real-time tracking.
"Assessment of Subjective Audio Quality From EEG Brain Responses Using Time-Space-Frequency Analysis" by C. Creusere, J. Kroger, S. Siddenki, P. Davis and J. Hardin
This idea is very intriguing. Forget asking people about perceptual quality; just have them bring their brains in for testing. The experiments, appear to change the quality of audio by either lowpass filtering, or scaling something, and then EEG measurements are classified. After several passes, I can't understand what is happening; but music by Beethoven and Blondie are involved.
"Catalog-Based Single-Channel Speech-Music Separation with the Itakura-Saito Divergence" by C. Demir, A. T. Cemgil and M. Saraclar
The catalog are those jingles that will interfere with speech signals on, e.g., news channels. This approach can significantly decrease the word error rate for automatic speech recognition.
So many papers, so little time.

"Music Structure Analysis by Subspace Modeling" by Y. Panagakis and C. Kotropoulos
This paper applies subspace clustering of beat-aligned auditory temporal modulation features to extracting the structure of musical signals. This is an interesting unsupervised method for discovering structure. Subspace clustering is reinvented in R.Vidal, "Subspace clustering," IEEE Signal Processing Magazine, vol. 28, no. 2, pp. 52-68, 2011. It is originally proposed in B. V. Gowreesunker and A. H. Tewfik, "A novel subspace clustering method for dictionary design," in Proc. ICA, 2009, vol. 5441, pp. 34-41; B. V. Gowreesunker and A. H. Tewfik, "A shift tolerant dictionary training method," presented at the Signal Processing With Adaptive Sparse Structured Representations (SPARS), Saint Malo, France, 2009, INRIA Rennes--Bretagne Atlantique; and B. V. Gowreesunker and A. H. Tewfik, "Learning sparse representation using iterative subspace identification" IEEE Transactions on Signal Processing, Vol. 58, no. 6, June 2010.
"A Framework for Fingerprint-Based Detection of Repeating Objects in Multimedia Streams" by S. Fenet, M. Moussallam, Y. Grenier, G. Richard and L. Daudet
Take the Shazam fingerprint method, and generate the anchors by a matching pursuit that emphasizes atom diversity instead of peak finding in a spectrogram.
"Evolutionary Feature Generation for Content-based Audio Classification and Retrieval" by T. Mäkinen, S. Kiranyaz, J. Pulkkinen and M. Gabbouj
An approach to optimizing features using particle swarms? I like it already.
"Robust Retina-based Person Authentication Using the Sparse Classifier" by A. Condurache, J. Kotzerke and A. Mertins
Sparse representation classification goes CSI. The paper does not mention the computational approach used to find the sparse representations.
"Gammatone Wavelet Features for Sound Classification in Surveillance Applications" by X. Valero and F. Alías
The paper proposes new features for discriminating between different sounds, such as dogs barking, people talking or screaming, guns shooting, feet stepping, and thunder clapping. This approach employs perceptual motivation for the features. Why handicap a classification or estimation system by human limitations?
"Daily Sound Recognition Using a Combination of GMM and SVM for Home Automation" by M. A. Sehili, D. Istrate, B. Dorizzi and J. Boudy
Somehow GMMs and SVMs are combined with sequence discrimination. I need to read closer since it looks really interesting.
"Enhancing Timbre Model Using MFCC and Its Time Derivatives for Music Similarity Estimation" by F. de Leon and K. Martinez
In past work, MFCC features have often been concatenated with delta MFCC and delta^2MFCC. This paper looks at the effect on classification of treating these separately using bags of frames of features (BFFs). Genre classification of musical signals shows differences between these approaches. But shouldn't a proper scaling of the dimensions work the same?
"Classification of Audio Scenes Using Narrow-Band Autocorrelation Features" by X. Valero and F. Alías
This paper proposes treating separately the bands of a multiband decomposition of music signals. The four low-level features extracted come from the autocorrelation of each separate band. These low-level features are tested in discriminating between acoustic environments (such as classroom and library).
"Large Scale Polyphonic Music Transcription Using Randomized Matrix Decompositions" by I. Ari, U. Simsekli, A. T. Cemgil and L. Akarun
This looks like very fine work employing randomized factorizations that can handle large datasets. The paper points to P.Smaragdis, "Polyphonic Pitch Tracking by Example," in Proc. IEEE WASPAA, pp. 125-128, 2011.
"Searching for Dominant High-Level Features for Music Information Retrieval" by M. Zanoni, D. Ciminieri, A. Sarti and S. Tubaro
This paper attacks the problem of making features more discriminable by clustering, and tests the new methods in the context of genre recognition. Groups of music excerpts associated with features that are near cluster centroids are evaluated by humans in semantic terms, which shows some interesting high-level properties, e.g., music that is "Groovy" or "Classic."
"AM-FM Modulation Features for Music Instrument Signal Analysis and Recognition" by A. Zlatintsi and P. Maragos
This paper describes applying perceptual-based features to identifying musical instruments. Tests on monophonic instrument recordings (is the IOWA dataset of isolated notes, or musical phrases?) give good results.
"Analysis of Speaker Similarity in the Statistical Speech Synthesis Systems Using a Hybrid Approach" by E. Guner, A. Mohammadi and C. Demiroglu
This work will be useful for the iPad game I will create someday.
"A Geometrical Stopping Criterion for the LAR Algorithm" by C. Valdman, M. Campos and J. Apolinário Jr.
The paper applies something called a Volterra filter to determine when to stop LAR. I have heard LARS is related to subspace pursuit, so this paper could provide some interesting ideas.
"Signal Compression Using the Discrete Linear Chirp Transform (DLCT)" by O. Alkishriwo and L. Chaparro
This paper proposes using a chirp transform for audio compression. Essentially, the algorithm estimates chirp parameters and a amplitude for each frame. The authors apply this algorithm to speech and bird sounds, and compare its performance to that of a compressed sensing of the audio. This makes no sense to me. That is like racing a red Ferrari against a red tomato, which makes no sense to me either. The paper says, "[bird song] is sparser in the time domain than in the frequency domain." What?? The figures are useless, but the description claims the chirp transform method is better.
Continuing with some papers selected from EUSIPCO 2012.

"How to Use Real-Valued Sparse Recovery Algorithms for Complex-Valued Sparse Recovery?" by A. Sharif-Nassab, M. Kharratzadeh, M. Babaie-Zadeh and C. Jutten
This appears to be a very nice practical paper. It shows that, as long as the sparsity is a quarter the spark of the dictionary, one need not solve a second-order cone problem for complex sparse recovery using error constrained \(\ell_1\)-minimization, but instead pose it as a linear program. The paper also corrects a few misstatements in the literature.
"A Greedy Algorithm to Extract Sparsity Degree for l1/l0-equivalence in a Deterministic Context" by N. Pustelnik, C. Dossal, F. Turcu, Y. Berthoumieu and P. Ricoux
This paper recalls the polytope interpretation underlying the work of Donoho and Tanner to find the class of signals that are not recoverable by error constrained \(\ell_1\)-minimization from compressed sampling in a deterministic sensing matrix. This paper definitely deserves a deeper read, and reminds me to return to some work from: M. D. Plumbley, "On polar polytopes and the recovery of sparse representations", IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3188-3195, Sep. 2007.
"Choosing Analysis or Synthesis Recovery for Sparse Reconstruction" by N. Cleju, M. Jafari and M. Plumbley
This paper explores where the analysis and synthesis approaches to sparse recovery are different. We see with more measurements, the synthesis formulation becomes better for sparser signals, and the analysis formulation better for cosparser signals. Furthermore, the analysis formulation is more sensitive to sparse signals that are approximately sparse. This is a nice paper with a strong empirical component.
"CoSaMP and SP for the Cosparse Analysis Model" by R. Giryes and M. Elad
On the heels of the last paper, this one adapts CoSaMP and SP to the Analysis formulation. The paper also presents a nice table summarizing the synthesis and analysis formulations.
"Matching Pursuit with Stochastic Selection" by T. Peel, V. Emiya, L. Ralaivola and S. Anthoine
In order to accelerate matching pursuit, this work takes a random subset of the dictionary as in this work, but also a random subset of the dimensions. Thus, it need not compute full inner products. They show good approximation ability with a smaller computational price.
"Robust Greedy Algorithms for Compressed Sensing" by S. A. Razavi, E. Ollila and V. Koivunen
This paper presents modifications to OMP and CoSaMP wherein M-estimates are used to help guard against the effect of possibly impulsive noise.
"A Fast Algorithm for the Bayesian Adaptive Lasso" by A. Rontogiannis, K. Themelis and K. Koutroumbas
This paper takes the adaptive lasso and makes it faster to apply to recovery from compressive sampling. It appears to do well with measurements in noise, for both Bernoulli-Gaussian and Bernoulli-Rademacher signals.
"Audio Forensics Meets Music Information Retrieval - A Toolbox for Inspection of Music Plagiarism" by C. Dittmar, K. Hildebrand, D. Gaertner, M. Winges, F. Müller and P. Aichroth
A toolbox! For detecting music plagiarism! The authors have assembled a Batman belt of procedures in the context of the REWIND project. It tackles three types of plagiarism: sample, rhythm, and melody.
"Detection and Clustering of Musical Audio Parts Using Fisher Linear Semi-Discriminant Analysis" by T. Giannakopoulos and S. Petridis
This paper presents an approach to segmenting a musical signal using bags of frames of features (BFFs), and then Fisher linear discriminant analysis and clustering to find sections that are highly contrasting in relevant subspaces.
"Forward-Backward Search for Compressed Sensing Signal Recovery" by N. B. Karahanoglu and H. Erdogan
The idea here is nice. Expand your support set by some number of elements, and then reduce it by that number less one. The expansion is done by selecting the elements with the largest correlation with the residual; the shrinkage is done by removing the elements with the smallest correlation with the new residual. The experiments are run at a small problem size with increasing sparsity, and we see this algorithm performs favorably compared to OMP and SP --- though "exact reconstruction rate" is never defined. It would be interesting to see simulations using Bernoulli-Rademacher signals. One relevant publication missing from the references is: M. Andrle and L. Rebollo-Neira, "Improvement of Orthogonal Matching Pursuit Strategies by Backward and Forward Movements", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, pp. 313-316, Toulouse, France, Apr. 2006. In that work, however, they apply this forward-backward business at the end of the pursuit.
"Fusion of Greedy Pursuits for Compressed Sensing Signal Reconstruction" by S. K. Ambat, S. Chatterjee and K. Hari
The idea is simple yet effective. Take two greedy pursuits, run them both, and combine their results to find one that is better. The experiments show favorable robustness to the distribution underlying sparse signals, as well as to noise. The cost is, of course, increased computation; but if it works, it works. I had a similar idea a while ago, but reviewers didn't like it. I remember one review said it was "too obvious." This fusion framework provides a nice way to get around the problem of selecting the best support of a single solver.
"Use of Tight Frames for Optimized Compressed Sensing" by E. Tsiligianni, L. Kondi and A. Katsaggelos
This paper adapts an approach by Elad for building Grassmannian sensing matrices for compressed sensing, and shows their performance is better than random sensing matrices with respect to mean squared reconstruction error.
"A Comparison of Termination Criteria for A*OMP" by N. B. Karahanoglu and H. Erdogan
I need to read about this A*OMP. It has been on my to do list for a long time.
"Classification From Compressive Representations of Data" by B. Coppa, R. Héliot, D. David and O. Michel
To what extent does compressive sampling hurt discriminatability? This paper experiments with it in fundamental ways to clearly show more measurements leads to fewer errors.
Hello, and welcome to brief presentations of some papers that are somehow relevant to my current research interests. Since there are so many interesting papers, I only take a cursory look over a few and jot down some notes --- which are probably inaccurate but might help me later when I need to find something I read.

"An Ellipsoid-Based, Two-Stage Screening Test for BPDN" by L. Dai and K. Pelckmans
From the title I first thought that BPDN was some malady; but it is the familiar "basis pursuit denoising" algorithm. Essentially, this paper presents an interesting way to reduce the computational cost of BPDN by performing a "screening" first to find elements that are highly likely to be zero. Previous work in this area come from: Z. J. Xiang, H. Xu, P. J. Ramadge, "Learning sparse representations of high dimensional data on large scale dictionaries," NIPS 2011; Z. J. Xiang, P. J. Ramadge, "Fast LASSO screening tests based on correlations," ICASSP 2012; and L.E. Ghaoui, V. Viallon, T. Rabbani, "Safe feature elimination in sparse supervised learning," Arxiv preprint arXiv:1009.3515, 2010. The figures in this paper are useless, so I can't really conclude anything without more closely reading the paper. :) This is a good opportunity for a public service announcement: Please, sympathize with readers who really want to understand your work: create figures that advertize and not antagonize.
"Online One-Class Machines Based on the Coherence Criterion," by Z. Noumir, P. Honeine and C. Richard
This is the first time I have heard of one-class classification. My first thought: what's the use? But then I thought, well, that is just detection. It is or it isn't. But after reading some more, I see it is the more deep problem of finding a way to detect an apple while only knowing apples and not other fruit. This paper builds a method for online learning of a one-class SVM using the coherence to limit the number of support vectors. It appears that elements will only be added to the dictionary of support vectors if they are all incoherent, which is simply to say they point in directions that are relatively orthogonal. It is impressive to see that the learning is two orders of magnitude faster than using the one-class SVM. Good work!
"Multi-sensor Joint Kernel Sparse Representation for Personnel Detection", by N. Nguyen, N. Nasrabadi and T. D. Tran
This paper appears to reinvent kernel sparse representation: P. Vincent and Y. Bengio, "Kernel matching pursuit", Machine Learning, vol. 48, no. 1, pp. 165-187, July 2002. It does extend this approach to joint sparse representation, and applies it to an interesting problem.

Off to EUSIPCO 2012

| No Comments
After an extremely busy (paper deadlines, grant writing, job applications and interviews, class preparation) and entertaining (vacations in London and Jutland, the Olympics, collecting everything Disco) summer, I am off tomorrow to attend EUSIPCO 2012 in the lovely city of Bucharest in Romania (which will apparently be very warm).

This year I am presenting three papers:
I will provide Po'D for each of these during next week, as well as interesting things I find along the way.
This looks like a really rewarding thing to solve, but after listening to the sounds myself while viewing the labels, I am not sure it is so solvable with audio features alone. Still, I might try a little something to see what happens.
Hello, and welcome to Paper of the Day (Po'D): Semantic gap?? Schemantic Schmap!! Edition. Today's paper provides an interesting argument for what is necessary to push forward the field of "Music Information Retrieval": G. A. Wiggins, "Semantic gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music," Proc. IEEE Int. Symp. Mulitmedia, pp. 477-482, San Diego, CA, Dec. 2009.

My one line summary of this work is:
The sampled audio signal is only half of half of half of the story.
I just came across these two interesting issued patents on concatenative synthesis: Just one word difference in their titles, and the latter patent was filed on the same day as the former one for one year; yet both are granted. That is strange.

About this Archive

This page is an archive of recent entries in the Signal Processing category.

Science is the previous category.

Sparse Approximation is the next category.

Find recent content on the main index or look in the archives to find all content.