June 2010 Archives

Hello, and welcome to Paper of the Day (Po'D): The Other Other Probabilistic Matching Pursuits Edition. Today's paper is: M. Elad and I. Yavneh, "A plurality of sparse representations is better than the sparsest one alone," IEEE Trans. Info. Theory, vol. 55, pp. 4701-4714, Oct. 2009. I have broached this subject before, namely in the following:
  1. A. Divekar and O. Ersoy, "Probabilistic matching pursuit for compressive sensing," tech. rep., Purdue University, West Lafayette, IN, USA, May 2010. (Po'D here.)
  2. P. J. Durka, D. Ircha, and K. J. Blinowska, "Stochastic time-frequency dictionaries for matching pursuit," IEEE Trans. Signal Process., vol. 49, pp. 507-510, Mar. 2001. (Related Po'D here.)
  3. S. E. Ferrando, E. J. Doolittle, A. J. Bernal, and L. J. Bernal, "Probabilistic matching pursuit with gabor dictionaries," Signal Process., vol. 80, pp. 2099-2120, Oct. 2000. (Po'D here.)
My wife of five years today likes to joke that someday she should be a coauthor on one of my research articles because she has to deal with all the ups and downs that comes with my research.

wedding.jpg
Happy Anniversary Love!

Vuvuzela Filtering

| No Comments
The World Cup continues to be an exciting event! In the first game, the world quickly became aware of the South African spectator instrument the vuvuzela. In the second game, the world continued to become aware of the vuvuzela, and demanded it stop, thus propelling the vuvuzela even further into the meme stratosphere, e.g.. I rather like the cacophonous hum of tens of thousands of buzzing instruments; and I can only imagine what it sounds like to be on the field surrounded by such a stadium-sized instrument. (Apparently, it does not facilitate communication between the players.) Also, it is more versatile than it first appears.

Today I notice that the Danish channel has very little vuvuzela hum, while the German channel showing the same game (Slovakia vs. Italy) has a lot of vuvuzela hum. So it appears that some feeds are attempting to filter the sound out, which is confirmed by this article. The folks at the Center for Digital Music at Queen Mary, University of London, have a great demonstration of one adaptive way to filter out the vuvuzela hum. I wonder how the TV stations are doing it? My bet is  on not using sparse approximation with a dictionary of vuvuzela atoms.

Regardless, today I take my futbol with the hum.

Simulation Bugs?

| No Comments
In the Po'D from a few days ago, the authors report unbelievably good accuracy in automatic music genre recognition. The authors also published another paper on the same topic: Y. Panagakis, C. Kotropoulos, and G. R. Arce, "Music genre classification using locality preserving non-negative tensor factorization and sparse representations," in Proc. Int. Soc. Music Info. Retrieval Conf., pp. 249-254, Kobe, Japan, Oct. 2009. In that paper, the authors use more of a tensor framework approach, but still use sparse representation for classification, and on the same dataset report classification accuracies about 1% higher than in the other paper. Both papers state in their introductions, "To the best of the authors' knowledge, the just quoted genre classification accuracies are the highest ever reported for both datasets."
Hello, and welcome to the Paper of the Day (Po'D): Music Genre Classification this time with Sparse Representations Edition. I continue today with an interesting paper that applies my favorite subject, sparse representations, to a topic that rings my skeptical bells, and shakes my cynical rattles --- automatic classification of music genre: Y. Panagakis, C. Kotropoulos, and G. R. Arce, "Music genre classification via sparse representations of auditory temporal modulations," in Proc. European Signal Process. Conf., (Glasgow, Scotland), pp. 1-5, Aug. 2009.
Hello, and welcome to the Paper of the Day (Po'D): Music Genre Classification Edition, no. 3. Today's Po'D follows on the heels of the Po'D from a few days ago: R. O. Gjerdingen and D. Perrott, "Scanning the Dial: The Rapid Recognition of Music Genres," Journal of New Music Research, vol. 37, no. 2, pp. 93-100, Spring 2008.

Human-machine mimicry

| No Comments
Since I was a child, I have wanted to be like that sound effects guy in the classic Police Academy slapstick movie franchise, Michael Winslow. And now he is the subject of a fascinating film where he reproduces the sound development of typewriters over the past 100 years --- which is simultaneously banal and riveting.



This reminds me of a "research" project I have wanted to do for some time: find the parameters of a software voice model such that it reproduces a sound, just like Michael Winslow. I already have a project name: The Onomatopoeiater. I just need to convince a panel of experts that this should be done.
Hello, and welcome to Paper of the Day (Po'D): Music Genre Classification Edition, no. 2. Today's paper is so new it has yet to appear in print form: G. Marques, M. Lopes, M. Sordo, T. Langlois, and F. Gouyon, "Additional evidence that common low-level features of individual audio frames are not representative of music genres," in Proc. Sound and Music Computing, (Barcelona, Spain), July 2010. On this blog I have broached the subject of music genre recognition, and in general the discriminativenessity of bags of low-level features such as non-linear MFCCs:
  1. Paper of the Day (Po'D): Music Genre Classification Edition
  2. This study proposes a method of music genre classification based on modeling series of MFCCs by an autoregressive process in order to incorporate statistics on short- and long-term behaviors.
  3. Paper of the Day (Po'D): Bags of Frames of Features Edition
  4. This study hints at a mediocre performance limit expected when using bags of frames of features, including MFCCs, for music signal processing.
  5. Paper of the Day (Po'D): Experiments in Audio Similarity Edition
  6. This collection of studies looks in controlled and realistic ways at how MFCCs perform in instrument classification tasks.
  7. Paper of the Day (Po'D): Multiscale MFCCs Edition This study looks at incorporating scaling information into the bags of frames of features approach for musical instrument recognition tasks.

Peer Review and Publishing

| No Comments
Here is an interesting blog post arguing that the millions of hours of peer review spent by many academics and researchers, and generally experts, which provides quality control for journals and other publications, represents a boon to the for-profit, or closed access, publishing companies. These hours are of course compensated, but not by the publishing company. Instead of remuneration, peer review is done out of responsibility, and vested interests to one's research community --- whether the actual time spent is paid for by research time, or vacation time, or free time. However, while the author uses this argument to defend his decision for publishing in and reviewing for only open access publications, I feel he misses one of the most important reasons for peer review no matter what the policies of the publication company: feedback to authors.

I have rarely submitted a review and recommendation for rejection that didn't result in a stronger and acceptable resubmission. And in my own submissions I have rarely received any feedback that was not helpful in making my research and its presentation more thorough, convincing, and broad. Ironically, the worst review I have received is an accept with nothing more stated; the best I have received is a reject, with an insightful explanation of why my approach is flawed (kind of like getting to peek in the back of the book at the answers to the odd numbered questions). Also, as a peer reviewer, I usually get to see how the other reviewers see the problems, and what they found that I missed, or vice versa. I feel that this is how I am paid back by performing peer review, whether or not the publishing company provides open-access or not. The publishers provide a recognized forum into which I can present my work to a broader audience, and with anonymity I can expect to receive a thorough and thoughtful reply from other people invested in the profession of research. (This "forum" is of course paid by the time spent by associate editors, which I hear brings with it fame and prestige --- but I will have to wait and see what happens when that time comes for me. :)

Update 14h44: Martin responds to me
Bob - that's a different point. I agree about the value of peer-review to the individual and the community (although I think there is an interesting discussion to be had about whether this is the only way to achieve these results now). The publishers don't do anything to facilitate this process (that is done by academics, for free). I would take your conclusion and look at it the other way - would any of the benefits you name not exist for an Open Access journal? In which case, why do the work that allows a large multinational to profit and lock away content, when you could get the same benefits and have the content open to all?
I agree that I can expect the same results to come from submitting to an open-access journal, but I am not so sure that "the publishers don't do anything to facilitate this process." It is definitely interesting to watch how information and access have become hot commodities.
Hello, and welcome to Paper of the Day (Po'D): Multiscale MFCCs Edition. Today's paper is extremely new: B. L. Sturm, M. Morvidone, and L. Daudet, "Musical Instrument Identification using Multiscale Mel-frequency Cepstral Coefficients," Proc. EUSIPCO, Aalborg, Denmark, Aug. 2010. (A similar paper, though more concentrated on using sparse approximation by matching pursuit, will be published later this year: M. Morvidone, B. L. Sturm, and L. Daudet, "Incorporating scale information with cepstral features: experiments on musical instrument recognition," Patt. Recgn. Lett., 2010 (in press).)

NB: I am related to the first author of this paper.

Dear Dr. Oblivious

| No Comments
NB: I am completely snowed in with preparing two camera-ready papers, reading and grading final project reports, and staying out of the rain, so today I offer a brief piece of advice I wrote a while ago, and saved for an occasion such as this.


Dear Dr. Oblivious,

I have been going through my deceased mother's kitchen and have become increasingly annoyed by her habit of not labeling her spice jars. She has two large urns, one containing white sugar and the other salt. But I have no idea which is which; and I don't want to just throw them out. What should I do?

Sweet or savory, or savory or sweet -- I just don't know.

Dear "Sweet or savory, or savory or sweet -- I just don't know,"

I am sorry to hear about your mother's penchant for not labeling her spices. Telling the difference between these two chemicals is extremely difficult and she has left you with what could be a considerable task. Table salt, or sodium chloride (NaCl), looks exactly like sugar; and sugar, or sucrose, looks exactly like sodium chloride (table salt). See Fig. 1.

sugar-salt-l.jpg
Figure 1: Which one is salt and which one is sugar? It is extremely difficult to tell just by looking, even through a magnifying glass!

But do not worry! Here is a quick way that you can tell the difference. You will need: 1 bunsen burner with ethanol fuel, 1 inert wire loop, 2 scoopulas, 1 rubber policeman, 5 ml 10% Hydrochloric acid (HCl) solution, 10 ml deionized water, 2 10 ml graduated cylinders. Since one cannot emphasize safety in the laboratory enough, you will also need the following: 1 pair of safety goggles, 1 full length lab coat with long sleeves, 1 chemical hood, 1 pair of high temperature gloves, 1 pair of clean room shoe covers, and 1 face mask with vapor filters. If you have long hair, make sure it is pulled back.

After you have donned the safety gear, move your unknown chemicals and the rest of the equipment under the chemical hood. Pour 1 ml of deionized water into each of the empty 10 ml graduated cylinders. With each scoopula, transfer a few pinches of one unknown (salt or the sugar) into one of the cylinders, and then a few pinches of the other unknown into the other. Mix each solution thoroughly.

Set up the bunsen burner and turn it on. Make sure the flame burns nearly colorlessly. Take the inert wire loop and submerge the end into the hydrochloric acid. Then rinse it off in the remaining deionized water. Place the metal loop into the flame and heat it up until the flame becomes colorless.

Once the wire loop is clean and has cooled, dip it into the first unknown solution to be tested. Place the loop into the hottest part of the flame and observe and record the color change to the flame in your properly dated lab book (Fig. 2).

flametest.jpg
Figure 2: A chemistry student demonstrates proper technique for "flame testing," except her is missing one person to watch out for "air in the lines", another person at the ready with Class C fire extinguished, and a third person to take notes.

If your unknown solution contains sodium ions, then the flame will burn a bright orange red. If, however, the solution does not contain sodium ions, the flame will not turn orange red. After testing the first unknown, dip the wire loop into the 10% HCl solution, rinse in the deionized water, and heat the metal loop in the flame until the flame becomes colorless.

If your unknown solution contains sodium ions, the flame will turn a bright orange red. If it is sugar, then it will not turn orange red.

Now submerge the wire loop in the other unknown and place in the flame. If this solution creates a orange red flame, then you know that solution contains sodium ions.

And that is it! Through the wonder and ease of modern chemistry, you have now determined which of your mother's urns contains sodium chloride, and which doesn't. HOWEVER, just because you have determined one urn contains a substance that when dissolved in H2O produces sodium ions, can you be absolutely certain that the other contains sugar? From our little experiment, and the wonderful problem of induction, you only know that the other urn does not contain salt, or at least did not when you tested it. There are any number of things that the non-salt urn contains, including strychnine, anthrax, and crack cocaine -- all of which look exactly like sugar. In order to really make sure it is sugar, and the other urn contains sodium chloride you will need the following: 2 rubber policemen, 1 centrifuge, and 1 mass spectrometer. We will return to this problem at a later date.

Happy spicing!

Do you have an obvious question for Dr. Oblivious? Send yours in today for some overly complex and utterly unhelpful answers.
Hello, and welcome to Paper of the Day (Po'D): Experiments in Audio Similarity Edition. Today's paper is: J. H. Jensen, M. G. Christensen, D. P. W. Ellis, and S. H. Jensen, "Quantitative analysis of a common audio similarity measure," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, pp. 693-703, May 2009.

Thank you to the first author for clarifying many of my questions!

It is somewhat taken for granted that features that work extremely well for problems in speech recognition and speaker identification, e.g., Mel-frequency Cepstral Coefficients (MFCCs), also work well for various problems in working with music signals, e.g., instrument or genre classification, source identification in search and retrieval. Other than the argument that the two signal classes are acoustic, there is some evidence that MFCCs will work for musical signals too because they embody in a compact way a description of the timbre of a sound, somewhat independent of pitch (hence the gender neutrality of speech recognition using MFCCs, i.e., all we need to know are the formant locations on a frame-by-frame basis). For the most part, the use of MFCCs for such tasks with musical signals has worked very well, though not as well for the much more well-behaved class of speech signals. Compared with musical signals, speech signals are much more bandlimited and less varied across sources (people), and are generated in way that is amenable to decomposition as a separable source-filter model, i.e., an autoregressive process driven by a periodic and/or stationary excitation. In addition to this, musical signals are often composed of a sum of numerous sources (polyphony), and hence in the non-linear MFCC representation, the timbre of the sum is not the sum of the timbres. (These features break down in similar ways for speech signals when there are multiple speakers.)
Yesterday Wired posted 6 Mashups of Music and Artificial Intelligence, where "mashup" is l33t 5p34k for "combination". Among their examples, I see three applications (uJam, Microsoft SongSmith, and LaDiDa) that attempt to bring music to people who aren't musical (but should be?), by applying compositionally dull accompaniment to some melody by using a derived set of rules. The results of all these applications are more hilarious to me (and annoying) than serious. Another application, The Swinger, is a fun toy that performs high-level segmentation and time dilation to a steady tempo audio signal to give it "swing." (I especially enjoy Enter Sandman this way.)

The remaining two applications actually use machine learning. The first application, "Emily Howell", is created by University of California, Santa Cruz professor David Cope. This application is built upon a significant amount of Cope's previous work in automating music composition and imitating musical style by data mining, and a rough form of concatenative techniques. (At least I remember him discussing the concatenative techniques in his book, Computer Models of Musical Creativity. Cambridge, MA: MIT Press. 2005.) This work is rooted in a history that extends back to the work by Lejaren Hiller and Leonard Issacson in programming a computer in 1955 to compose the excellent, and mysteriously Americana-style, string quartet, The Illiac Suite. And like them, Cope is a true pioneer with skills both in computer programming and music composition.

The second application is the robot drummer of Georgia Tech --- a drumming robot that actually listens, interprets, and learns how to accompany human players. The behavior of this work reminds me of The Continuator, developed at the Sony Computer Science Laboratory by Dr. François Pachet et al. I saw a preliminary version of this program demonstrated at the 2000 International Computer Music Conference in Berlin, and was impressed by how well it worked at "continuing" what the human player started. (I think it won best paper award too.)

In my opinion, of the six applications here, these two are the most interesting from both a research and compositional perspective. They do not create and recreate tired idioms; and they do not attempt to address the assumed "problem" that, "not many people have the time or desire to learn about the craft of music, but they want to be a pop star." Ok, enough with the bitterness --- I am coming up from the "low" that accompanies grant proposal preparation.
Hello, and welcome to Paper of the Day (Po'D): Consensus matching pursuit for multi-trial EEG signals Edition. Today's paper is: C. G. Bénar, T. Papadopoulo, B. Torrésani, and M. Clerc, "Consensus matching pursuit for multi-trial eeg signals," J. Neuroscience Methods, vol. 180, pp. 161-170, Mar. 2009. With this paper, we add yet another greedy sparse approximation method with the acronym CMP: First, there was Cyclic MP (M. G. Christensen and S. H. Jensen, "The cyclic matching pursuit and its application to audio modeling and coding," in Proc. Asilomar Conf. Signals, Syst., Comput., (Pacific Grove, CA), Nov. 2007); then there was Complementary MP (G. Rath and C. Guillemot, "A complementary matching pursuit algorithm for sparse approximation," in Proc. European Signal Process. Conf., (Lausanne, Switzerland), Aug. 2008.) Now we have Consensus MP. Are there any takers for Conjugate MP? Or Confabulatory MP? Or Circumlocutive MP? (which one may argue should be the MP decompositions that always go awry.) Or perhaps Cholangiocholecystocholedochectomy MP? (First person to put that in an abstract gets an award from me.)
Now that I am finished with a submission to Asilomar 2010, I have a brief respite before beginning to read 8 student project reports, and preparing two grant proposals. So, about 10 minutes.

Hello, and welcome to Paper of the Day (Po'D): Speech Recognition by by Sparse Approximation Edition, No. 2. Today's paper describes using sparse approximation to aid in the automatic recognition of spoken connected digits corrupted by noise: J. Gemmeke and T. Virtanen, "Noise robust exemplar-based connected digit recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., (Dallas, TX, USA), March 2010. This work extends that which I reviewed in a previous Po'D here.

Blog Roll

About this Archive

This page is an archive of entries from June 2010 listed from newest to oldest.

May 2010 is the previous archive.

July 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.