March 2012 Archives

Following up on my previous experiments, let's take a look at some results when we train and test on hi-fi audio recordings. As we saw before, with the Bergstra et al. method, recognition accuracy was quite high at just below 80%. As far as I have seen in the literature, this is usually as far as people go when looking at the results. Let's dig a little deeper and see where mistakes are made. Below we see the 10 classifications of all 100 files labeled Disco from 10 runs of 5-fold cross validation. (The darker the square, the more times the exceprt was labeled that particular genre.)

confusionzoom_excerpt_disco_tree.jpg Some things here are very encouraging. For instance, entry 27 should actually be labeled Hip hop since it is "Rapper's Delight" by the Sugarhill Gang; and the algorithm seems to agree with 10 out of 10 votes there. Entry 15 though is the quintessential Disco-defining tune "Boogie Nights" by Heatwave, and the algorithm has labeled it Pop. Maybe that is ok. However, not ok is 8 times out of 10, "Shake your booty" (entry 69) is labeled Blues. And George McCrae's "I can't leave you alone" (entry 39) is 10 out of 10 times labeled Country. Just to hear how Country this tune is, hear it for yourself:

We also see entry 64, "Funky Town" by Lipps, Inc., is 10 out of 10 Reggae. "Funky Town" is only one of the greatest Disco hits ever. Most damning though is entry 83, which 10 times out of 10 is classified as Rock. Any algorithm able to distinguish Disco from the other nine categories would not make such a mistake:

That there I call the "Disco Duck Test", and any good music genre algorithm should be able to get it from the bass line, horns, and open high hat pattern --- except current algorithms do not look for such high-level things.

What about the method employing sparse approximation? Here are its Disco entry confusions (done here with 100 tests of 10-fold cross validation).

confusionzoom_excerpt_disco_SRCATM.jpg Applying the Disco Duck Test, we are happy to see entry 83 has been correctly classified. And "Shake your booty" (entry 69) is labeled Disco. Furthermore, it classified LaToya Jackson's mislabeled excerpts (entries 23 and 26) as Pop. And entry 85, "Wordy Rappinghood", it thinks is Hip hop --- which it is! Not Disco. So there is hope! However, "Rapper's Delight" by the Sugarhill Gang, entry 27, is now 100 out of 100 Reggae. "Boogie Nights" (entry 15) is more Pop than Disco. And George McCrae's "I can't leave you alone" (entry 39) is Reggae and a little Pop, but never Disco. Furthermore, and unforgivably, this algorithm considers entry 67 as 100% Metal. Here is the pure Metal for you:

(Entry 72 is also by Abba, and apparently pure Metal.)

There are many other problems like this in the other genres. For instance, one of the methods thinks "The Lion Sleeps Tonight" and "A whole new world" from Aladdin are 10 out of 10 of the Blues genre. Beastie Boy's "You gotta fight" is 10 out of 10 Metal (which is not as unforgivable as the other mistakes); AC/DC's "Highway to hell" is nearly Disco, and so on and so forth. Of course, we don't want to throw the baby out with the bath water, but it is clear to me that the time is long past to answer whether the algorithms of the past decade of work even discriminate between music genres, or just between confounding variables.
First, listen here:

and now here:

In both cases, we hear John Lee Hooker playing "One bourbon, one scotch, one beer". In the first case, though, the sound is FM-quality (although compressed to 32 kbps MPEG-1 layer 3); and in the second it is AM quality (same bit rate as before, bandpass between 50 Hz - 4500 Hz). I think it difficult to find anyone familiar with the Blues genre of music who would have trouble classifying the latter as not Blues, and the former as Blues. In other words, though I have changed the characteristics of the transmission channel, the genre of the music underlying the audio data is not changed; its Bluesiness remains the same. (Hearing the AM-quality one though makes me nostalgic. My father always listens to AM radio when he drives.)

25 points for new PhD students

| 1 Comment
I wrote these at the conclusion of my PhD summarizing how I survived the tortuous process that was as much intellectually as psychologically challenging.

1. I am not my work; my work is not my worth.
This took a while to learn, but once I did I was no longer offended by my algorithms performing 250% shittier than state-of-the-art.
2. I can only do as best as I can do.
That there is logic.
3. I can't please everyone.
I have learned to deal with the fact that some people have sticks up their asses. It is not my job to whittle them.
4. Burn no bridges.
I made that mistake one time too many, and in graduate school I put it in practice.
5. If I want to get a grip, I have to let go.
That there is semantics.
6. Feel the fear and do it anyway.
7. A Ph. D. is something one does while living.
In other words, life happens.
8. Take nothing personally.
Conference paper rejected? Water off a duck's back. Professor jumping into bushes when he sees you coming? Water off a duck's fucking back.
9. Surround yourself with motivated and inquisitive people. If you don't know anyone, spend your time in the library until you do.
I perform my best when other people are doing their best. And I do my best so other people can do their best.
10. Research should be scary and uncertain if the terrain is uncharted. Think of yourself as a pioneer.
The first part was told to me by Dr. Gibson. The second part was told to me by Dr. Rabiner.
11. If your advisor says nothing, take that as a complement.
See #8.
12. Learn to depend only on yourself, e.g., JFGI, RTFM. Don't be needy.
Maybe I have a stick up my ass too?
13. A dissertation should tell a story of research; it does not need to solve the world's problems. You are not expected to solve the world's problems.
Good advice from all my advisors.
14. Question everything. Argue carefully. Point out all assumptions. Make logical steps to your claims. Be honest. Respect peer review. It is why science is the most trustworthy reflection of the real world. Don't fuck that up.
These are incredibly important for distinguishing your work from that of the makers of The Secret (which does not even deserve a hyperlink, JFGI), and other inane bullshit peddled by anti-science cretins.
15. Intelligence is not as important to doing a Ph. D. as the diligence, persistence, honesty, and bravery of your spouse.
16. Work on yourself as much as your research. Join a support group, start therapy, work a 12-step program.
Also turning 30. My concentration when hyper when I turned 30.
17. When interacting with your advisor, do not bring baggage accompanying personal problems. Do not give too much information.
Following this has made interactions pleasant and productive.
18. You don't need to know everything. By the end you will understand how to be comfortable with the feeling that you don't know everything, nor need to, because you know where to begin.
Completing a Ph. D. brings the confidence of critical thinking, of communicating your ideas, and to lead into the darkness.
19. If you wish you could find the answers in the back of a book, but your stomach drops when you find a paper that appears to address the same problem as your research, you are on the right track.
Strange how that works.
20. Keep an honest record of how you spend your time.
I kept a spreadsheet logging the hours I spent doing research, class work, teaching, and putzing around on the Internet. It helped keep me on track, but made me a bit anal. Plus, at the end, I made cool graphs and saw my behavior around conference deadlines.
21. Keep a thorough record of your research, including questions, insights, thoughts, ideas, graphs, programs, doodles, correspondence, CFPs, etc.
I got a bound chemistry lab book and starting keeping detailed records. When code would disappear, I could go back and see what I wrote. Best of all, someday they could end up in a museum, or presidential library!
22. Break your tasks into small manageable chunks. Create and solve "toy" problems.
Good advice dispensed by Dr. Rabiner.
23. Collect and annotate all references. I use Bibdesk.
This record is indispensable when managing more than 50 references. (And now, with my experiences here, I say start a blog and contribute good research to the Internet.)
24. Compile into reports all the research and work you do over each month.
My main advisor made me do this and give him copies; and then after four months he admitted, "I never read those." Then I realized, "Those are for me, not him. I am the grasshopper, Wise One."
25. Don't be discouraged when looking at other people's dissertations -- yours will appear just as complex to them.
Hello, and welcome to Paper of the Day (Po'D): Missing Data Imputation in Noise Robust Speech Recognition Edition. Today we have another paper from the PSI Speech research group at the Katholieke Universiteit, Leuven Beligium. Today's paper is: J. F. Gemmeke, H. Van hamme, B. Cranen and L. Boves, "Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition", IEEE J. Sel. Topics Signal Process. vol. 4, no. 2, pp. 272-287, Apr. 2010.

NMF and Multiplicative Updates

Consider that, given the non-negative vector \(\vu \succeq 0\) and non-negative matrix \(\MPsi \succeq 0\), we want to build the model $$ \vu \approx \MPsi \vx $$ with the restriction of a non-negative solution \(\vx \succeq 0\). Thus, we might want to solve $$ \min_\vx \frac{1}{2} \|\vu - \MPsi \vx \|_2^2 \; \textrm{subject to} \; \vx \succeq 0 $$ using the Euclidean cost function, or instead, with the generalized Kullback-Leibler divergence (KLD), $$ \min_\vx \vu^T \log(\textrm{diag}(\MPsi\vx)^{-1}\vu) - \|\vu - \MPsi\vx\|_1 \; \textrm{subject to} \; \vx \succeq 0 $$ where \(\log\vs\) applies the logarithm to each component of \(\vs\). Finding such solutions is a portion of non-negative matrix factorization (NMF).
Then my best advice for you is to be sure and include a letter with your submission that reminds everyone involved what a correspondence item is all about. On the IEEE Transactions for Audio, Speech and Language Processing website it says,
Correspondence items are short disclosures with a reduced scope or significance that typically describe a use for or magnify the meaning of a single technical point, or provide brief comments on material previously published in the TRANSACTIONS.
Be sure to include this on your letter, citing the IEEE website, and maybe even include it in your paper itself. Within your paper, call it a correspondence, rather than an article. Furthermore, in your letter, explain to the reviewers and associate editor how your paper is a correspondence, and that you are submitting it as a correspondence, and how it, as a correspondence and not research article, should have a reduced scope that provides brief comments on previously published material, and in fact, on material previously published in the transactions. Include the references to the previously published material as well. Stress the point that as a correspondence item it need not propose and test new algorithms, or solve new technical problems; but show how your correspondence does magnify the meaning of a single or few technical points.
Hello, and welcome to Paper of the Day (Po'D): Practical Implementations of Exemplar-based Noise Robust Automatic Speech Recognition Edition. Today I begin reviewing a series of interesting papers coming from the PSI Speech research group at the Katholieke Universiteit, Leuven Beligium. Today's paper is: J. F. Gemmeke, A. Hurmalainen, T. Virtanen and Y. Sun, "Toward a practical implementation of exemplar-based noise robust ASR", in Proc. EUSIPCO, pp. 1490--1494, 2011.
I have started to poke some of the questions of a few months ago about music genre recognition. In particular, I want to see how the faults in this well-used dataset affect the results of classification. One might assume that these faults will only reduce the performance of algorithms; however, the faults of the dataset (1000 excerpts) are so varied that I cannot be sure about this until I do further testing. For instance, with so many exact replicas (54), it is possible that in cross-validation the same features are in the training and test sets, which will of course increase the mean performance in particular folds. There are also many excerpts from the same artist and/or album (e.g., 28 from Bob Marley, 24 from Britney Spears), from the same recording (12), and versions (12). Thus, the producer effect and artist effect will inflate performance. With all the mislabelings though (118), accuracy could be hurt. And in the cases where the training set has multiple copies of the same features, we can consider the training data to not be as rich as thought, which will decrease performance. All in all, the sum total of the good and bad effects of the faults may cancel each other; and it appears that the results of classifiers run on the faulty dataset are not too different from those obtained using other music genre datasets (which might have similar problems, but I am not sure). So this is an interesting question.

Secondly, I remain to be convinced for the many algorithms proposed to recognize genre, that it is actually the music (rhythm and instruments for instance), and not extramusical features (such as compression), that are driving the recognition. In other words, I doubt the simplest problems have even been solved.
9th Sound and Music Computing Conference
12-24 July 2012, Copenhagen, Denmark
Department of Architecture, Design and Media Technology,
Aalborg University Copenhagen

SMC 2012 and Computer Music Journal (CMJ) are happy to announce that the two most highly rated papers presented at this year's conference will be selected for expansion and publication in a future issue of CMJ.

The SMC Conference is the forum for international exchanges around the core interdisciplinary topics of Sound and Music Computing. SMC2012 will feature paper, poster and demos sessions, tutorials, musical concerts, and other satellite events. We invite submissions examining all core areas of sound and music computing, such as
  • Automatic music generation/accompaniment systems
  • Computer environments for sound/music processing/composition
  • Networked music generation
  • Physical modeling for sound generation
  • Sound/music signal processing algorithms
  • Digital audio effects
  • Musical sound source separation and recognition
  • Automatic music transcription
  • Music information retrieval
  • Musical pattern recognition and modeling
  • Music and robotics
  • Computational musicology
  • Sonic interaction design
  • 3D sound/music
  • Data sonification
  • Visualization of sound/music data
  • Interfaces for music creation
  • Interactive performance systems
  • Musical performance modeling
  • Sound/music perception and cognition
  • Multimodality in sound and music computing
  • Web 2.0 and mobile music and audio
All submissions are peer-reviewed according to their novelty, technical content, presentation, and contribution to the overall balance of topics represented at the conference. Paper submissions should be no longer than 8 pages, including figures and references. Accepted papers will be designated to be presented either as posters, or lectures possibly augmented with a demo session. Authors are encouraged to state whether they wish to have a poster, or a lecture, with or without an additional demo session.

The paper submissions will be done electronically by the submission deadline of Monday 2nd of April 2012. The notification of acceptance will be Wednesday 2nd of May 2012. The deadline for submissions of the camera-ready papers will be on Monday 4th of June 2012. At least one of the paper authors has to be registered to the conference for the paper to be accepted and published.

All accepted papers, independently of the presentation format, will be included in the conference Proceedings which will be distributed as an electronic publication.

For more information, including templates, please check

================Important dates=================
  • Deadline for submissions of music and sound installations: DONE
  • Deadline for paper submissions: Monday 2 April, 2012
  • Deadline for submission of final music and sound installation materials: Friday, March 30, 2012
  • Deadline for applications to the Summer School: Friday March 30, 2012
  • Notification of acceptance to Summer School: Monday April 16, 2012
  • Notification of paper acceptances: Wednesday 2 May, 2012
  • Deadline for submission of camera-ready papers: Monday 4 June, 2012
  • SMC Summer School: Sunday 8 - Wednesday morning 11 July, 2012
  • SMC Workshops: Wednesday afternoon 11 July, 2012
  • SMC 2012: Thursday 12 - Saturday 14 July, 2012
modernart1.jpg During a recent trip to a modern art museum, I was absolutely delighted to find that the collection contains Robert Rauschenberg's ``White Paintings'' from 1951. To my dismay, however, I found that one of the pictures has been hung upside down. The security guard that yelled ``You're too close!'' couldn't have cared less, saying ``You got a screw loose buddy --- orientation means nothing with respect to the abstract. Now get a move-on before I taze you.'' So I decided to passive-aggressively conduct a poll to prove that one of the paintings is incorrectly hung. Over the course of a few hours, I asked 50 people to pick which one of three paintings is hung incorrectly, and whether it is top-left (TL), top-right (TR), or top-down (TD). My data is shown below.
Hello, and welcome to Paper of the Day (Po'D): Implementations of Orthogonal Matching Pursuit Edition. Today's paper is my first of three submissions to 2012 EUSIPCO, and provides an overdue look at the numerical properties and performance characteristics of three implementations of orthogonal matching pursuit: B. L. Sturm and M. G. Christensen, "Comparison of Orthogonal Matching Pursuit Implementations", submitted to EUSIPCO 2012, Bucharest, Romania, Aug. 2012. All the MATLAB code for reproducing the experiments and figures is here.

My one line summary of this paper is:
Wow, the QR implementation of OMP really cooks!

Fun with probability

| No Comments
I am teaching an undergraduate course this semester about the design and analysis of experiments, and a major portion of my syllabus entails giving the students an appreciation of and ability to use the awesome mechanics supplied by one of my favorite subjects: probability theory. I have always wanted to teach my own probability class, and this was my first chance. So I have packed it full of fun games, thought experiments, and surprising results. I think it is always nice to remind myself of the utter hopelessness of my feeble mind to intuit likelihoods with only my gut instinct.

Anyway, here is a game that I have only seen discussed briefly in a limited form, and have yet to find its formal treatment (though I have not looked too hard because I don't know for what to search). Have a friend (or enemy) take 3 slips of paper. write a different number on each (anything from \(-\infty\) to \(\infty\), exclusive), and mix them in a stove top hat. Now you draw the first slip. You can declare it to be the largest number of the three, or discard it and draw another. Once you discard a number though, you cannot return to it. In this way, you must find the largest number to win.

Blog Roll

About this Archive

This page is an archive of entries from March 2012 listed from newest to oldest.

February 2012 is the previous archive.

April 2012 is the next archive.

Find recent content on the main index or look in the archives to find all content.