## Paper of the Day (Po'D): On Theorem 10 Edition

Hello, and welcome to the Paper of the Day (Po'D): On Theorem 10 Edition. Today's paper is now available with early access: B. L. Sturm, B. Mailhé, and M. D. Plumbley, "On Theorem 10 in On Polar Polytopes and the Recovery of Sparse Representations''," IEEE Trans. Info. Theory (in publication).

Though my past 16 months have for the most part been active in a completely different topic, I have been able to have some fun in sparse approximation theory. In this paper, we illuminate two previous results about exact recovery, and in the process, discover a more general exact recovery condition for basis pursuit (BP). I found this work fascinating for several reasons. The first is that it is not exactly trivial when considering dictionaries that do not have unit-norm atoms in orthogonal matching pursuit (OMP) and BP. The second reason is that this led us to a, ERC of BP different from Tropp's. The third reason is that the previous result of Plumbley --- which is not wrong, but does not express the properties of any recovery atom --- led us to investigate the nestedness properties of OMP and BP. And this shows that OMP and BP are quite different algorithms in the nestedness department.

With regardless to that last point, we have an ICASSP paper this year: B. Mailhé, B. L. Sturm, and M. D. Plumbley, "Recovery of nested supports by greedy sparse representation algorithms", Proc. ICASSP, 2013.

## Paper of the Day (Po'D): Evaluating music emotion recognition: Lessons from music genre recognition? edition

Hello, and welcome to the Paper of the Day (Po'D): Evaluating music emotion recognition: Lessons from music genre recognition? edition. Today's paper is my third accepted for presentation at the 2013 IEEE Int. Conf. on Multimedia and Expo: B. L. Sturm, "Evaluating music emotion recognition: Lessons from music genre recognition?"

The one-line summary of this paper, for those in a hurry: Meaningful conclusions about music genre/emotion recognition systems do not follow from standard approaches to evaluation. Here is why, and what to do about it.

In this paper, we finally identify a major and fundamental problem with most research in music genre/emotion recognition. Starting with my paper "Two Systems for Automatic Music Genre Recognition: What Are They Really Recognizing?", I knew something was not right: why do state of the art and high-performing music genre recognition systems behave so strangely? Surely, someone else has remarked on this behavior, and taken different approaches to evaluate systems designed to address this extremely complex problem.

So, I looked at how genre recognition systems have been evaluated by reading a few papers, and cataloging their approaches to evaluation: "A Survey of Evaluation in Music Genre Recognition." This revealed that hardly anyone has thought much about evaluation, and a vast majority use a standard approach in machine learning to evaluate supervised learning, i.e., comparing predicted labels with those of a ground truth, and present accuracy as a figure of merit.

So, in "Classification Accuracy Is Not Enough : On the Evaluation of Music Genre Recognition Systems", we show why this evaluation approach --- used in 91% of published work we review in our survey --- is incapable of measuring the depth to which a music genre recognition system recognizes genre. In short, a richer kind of evaluation is necessary for determining which proposed systems are promising for solving the problem.

Then, in "The GTZAN dataset: Its contents, its faults, their affect on evaluation, and its future use" (about to be resubmitted), we show that the results of 96 works that evaluate classification accuracy even in the same dataset cannot be meaningfully compared in any useful sense. (We also show that when taking into account all the faults of GTZAN, classification accuracies of systems that were estimated to be around 80%, decay to 50% or lower.)

Now, in today's PoD, we identify the principal goals of music genre/emotion recognition, and show why the most widely-used approach to evaluate these systems provide nothing relevant. (We do not argue whether genre/emotion recognition is a good idea, or if it is well-posed, and so on. We only address the fundamental problem of evaluating whether a system can recognize music genre or emotion.) In the words of Richard Hamming: "There is a confusion between what is reliably measured, and what is relevant. ... Just because a form of measurement is popular has nothing to do with its relevance." When a genre recognition system is tested by comparing labels in test data having many uncontrolled independent variables (e.g., dynamic compression, dynamic range, loudness, and so on), one cannot logically conclude the performance is due to a capacity to recognize genre/emotion in music. Even when one sees 100% classification accuracy! Classification accuracy in this case, while easy to do, is irrelevant for reliably measuring whether a system is recognizing genre/emotion. The conclusion does not validly follow as long as all but one independent variable are uncontrolled. This is basic experimental design, and it appears to have been rarely considered in music genre/emotion recognition.

In short: When Clever Hans trots into town, do not insist on asking more questions of the same kind.

Now, watch this and tell me, why does Maggie look to her handler when it was Oprah who asked the question?

## Optimum path forest at ISMIR 2011

Appearing at ISMIR 2011 was the following intriguing paper: C. Marques and I. R. Guiherme and R. Y. M. Nakamura and J. P. Papa, "New Trends in Musical Genre Classification Using Optimum-Path Forest", Proc. ISMIR, 2011. As it reports classification accuracies in GTZAN above 98.8%, it certainly caught my attention. With respect to the classification accuracies in GTZAN reported in 94 other works, we see that of the optimum path forest in the image below as reference [55]:

So, with the great help of the fourth author Joao Papa, and their excellent Optimum Path Forest library, I was quickly on my way to reproducing the results.

Joao has filled in a critical detail missing from the paper. Their results come from classifying every feature (computed from a 23 ms window) instead of the 30 s excerpts. This is even more curious to me since experience shows such classification should be very poor ... unless the partitioning of the dataset into training and test sets distributes features from excerpts across instead of keeping them separated. Looking at the code behind the opf_split'' program confirms that it takes no care to avoid a biased partition. Another curious detail in the paper is that they write they have 33,618 MFCC vectors from the 1000 excerpts in GTZAN. I get 1,291,628 MFCC vectors.

So, I decided to run this evaluation as I think they did:

./runOPF.sh alldata.bin 0.5 0.5 1 1

where "alldata.bin" is an OPF-formatted file of the features I compute in MATLAB, the first two numbers specify the train/test split, the last two numbers denote whether feature normalization is used, and how many independent trials to run. Here is some of the output:

Training time: 23248.525391 seconds
Testing time: 30824.958984 seconds
Supervised OPF mean accuracy 74.323967

We see that after nearly 15 hours of computation, we don't get anywhere near the 98.8% accuracy. And without feature normalization, the accuracy rises only to about 76.3%. The paper reports that the training and testing times for OPF in GTZAN are 9 and 4 seconds, respectively. Respectfully, my computer is not so slow as to cause a 7000 fold increase in computation time. I tried several other things to increase the accuracy, but nothing was working.

Then I tried testing and training on the same fold, and got an accuracy of 99.97%. Joao confirms that this appears to be at least part of what happened.

Now, I am going to run the same experiment, but using a proper partitioning, and the fault filtering necessary for evaluating systems with GTZAN. I predict that we should see the classification accuracy drop from 74 to at least 55.

## Paper of the Day (Po'D): Music genre classification risk and rejection edition

Hello, and welcome to the Paper of the Day (Po'D): Music genre classification risk and rejection edition. Today's paper is my second accepted for presentation at the 2013 IEEE Int. Conf. on Multimedia and Expo: B. L. Sturm, "Music genre recognition with risk and rejection", Proc. ICME, 2013.

The one-line summary of my paper, for those in a hurry: Some misclassifications are much worse than others, so we show how to make an MGR system take that into account.

When it came time in my multivariate statistics course to come up with fun examples of considering risk in classification, I said, "Consider a music genre recognition system that labels a classical piece 'metal' --- the horror! Hence, we can specify for the system that it must be quite sure something is metal before calling it 'metal'." Then I said, "I will show you the results of this easy example in the next class period."

It took a bit longer than that to get the system working, as it was not as trivial as I thought. And while some researchers in music genre recognition over the past ten years have hinted at such a possibility, we find that no one has actually done it. Before I knew it, I had given birth to a paper!

## Paper of the Day (Po'D): Music genre classification via compressive sampling edition

Hello, and welcome to the Paper of the Day (Po'D): Music genre classification via compressive sampling edition. Today's paper is: B. L. Sturm, "On Music genre classification via compressive sampling", Proc. ICME 2013. This paper is the closing chapter on the findings reported in K. Chang, J.-S. R. Jang, and C. S. Iliopoulos, "Music genre classification via compressive sampling," in Proc. Int. Soc. Music Information Retrieval, (Amsterdam, the Netherlands), pp. 387-392, Aug. 2010.

The one-line summary of my paper, for those in a hurry: Results contradicting two well-supported findings of machine learning and music information research? We show the contradictions are not real.

I first discussed the work of Chang et al. here; and then two years later discussed several issues with the work, and finally reproduced it and submitted a paper with my code. My paper is now accepted and revised with many changes suggested by the helpful reviews. This is my third negative results paper (the first is here, the second here). I must take care to not become too negative!

Anyhow, it is quite satisfying to receive the following reviewer comment on my paper:

The paper provides extremely reproducible results that help to clear the confusion caused by previous works. The result is consistent with other works which show that compressive sampling / random projection reduce classification accuracy. Classification research is heavily directed by the top performers in the field. In this case, the authors address the failings of previous authors to sufficiently explain their methods. Without papers such as this one, the field continues to be muddied by works that claim inflated results without providing sufficient data to reproduce their work, and researchers waste time chasing phantom results. I applaud the rigor with which the research was performed and explained.

## Programming in python

After two days full of reconciling my system to run python and iPython --- during which I "\rm -Rf"ed a directory that I shouldn't have --- things appear to be running. I went the way of enthought --- which is free for academic use, but then found it is only 32-bit on my 64-bit machine. (Peter Williams also points to Anaconda as an option.) So, I removed all of that, and used homebrew (also see here) to install python and all the libraries I was missing, and pip to install the packages I need. Now, ipython with the notebook looks like an awesome way to develop! There is also IDLE.

Now, here are some good resources:

## Seminar: Friday March 15, INRIA, Rennes, France

I am giving a seminar this Friday March 15, at INRIA in Rennes, France. 11 AM, Room Sicile

Title: Evaluation

Abstract: Evaluation. Evaluating evaluations.* Evaluation of evaluations and assessments. Evaluation, examination, and evaluation of testing evaluations. Evaluation gauging, gauging evaluation evaluations, and evaluating evaluation evaluations. Evaluating. Evaluating evaluation and evaluation evaluating. Evaluations of evaluations and evaluating evaluation evaluations. Evaluation of evaluation appraisals, evaluation evaluation evaluation analysis, and assessing evaluation evaluating evaluation evaluations. Finally, evaluation testing, evaluating evaluation testing evaluations, and reviewing evaluation evaluation. Evaluation.**

* Evaluation.

** Evaluating evaluation and evaluations.

(NB: I discuss evaluation in music information research, with particular emphasis paid to music genre recognition, music mood recognition, and autotagging. I argue that most approaches to evaluating such algorithms have no internal and external experimental validity when one wishes to measure their _capacity for recognition_. The example of the horse Clever Hans suffices to illustrate this point. Algorithms can have very high classification accuracies by virtue of confounds, but current approaches to evaluation do not control for this.)

## MPTK in Python: basics

Today, I begin looking at starting to begin what I need to do in order to start and reproduce my wivigram code from yesterday using the python scientific visualization library matplotlib. First, I download the dmg, and install it. Then I read the user guide. Then I see there is something called Enthought, which provides an all-in-one working python installation, and includes matplotlib! Then I see it costs money. Then I quickly return to the terminal.

So, in the terminal I type "python", and away we go! To lunch.

## From MPTK books to Wivigrams

Here is a sonogram of the introduction to "Pictor Alpha" by Curtis Roads.

Now, we make a dictionary of Gabor atoms of various sizes, and decompose this signal into 5000 atoms using MPTK. How can we visualize the output? One of the easiest ways is by a superposition of the Wigner-Ville distributions of the atoms, scaled by their energy in the decomposition. This produces the following picture.

The code I used to generate this is here. Much more can be done to the code by way of software engineering to streamline it; but it is only to serve as a simple example of how to do such visualizations.

Now, time to look at doing the same but in python.

## MPTK 0.6.1 Installation on Mac OS X 10.7.5

Wait no longer! In 2010, I chronicled my installation of MPTK 0.5.3 on Mac OS X 10.6.4. Now, I do so for MPTK 0.6.1 on OS X 10.7.5.

First, download the sources from here. While unpacking, I get some ice tea and think about the good old days. Now, let's say I just want to build and install the package from the command line.
1. I create a directory "mkdir MPTK061"; and "cd MPTK061".
2. Then I "cmake PATH_TO_SOURCE/MPTK-Source-0.6.1/".
3. We might need to adjust some things, but we need to generate the make script. So, I type "ccmake ." If you don't have MATLAB installed, turn off "BUILD_MATLAB_MEX_FILES". I leave mine on. Then I hit "c" for configuring, and "g" for generating.
4. Then I type "make".
5. Then I get errors:
In file included from /usr/local/include/mptk.h:295,
from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/mp_system.h:101:1: warning: "ULONG_MAX" redefined
In file included from /usr/include/machine/limits.h:6,
from /usr/include/limits.h:64,
from /Applications/MATLAB_R2010b.app/extern/include/tmwtypes.h:43,
from /Applications/MATLAB_R2010b.app/extern/include/matrix.h:293,
from /Applications/MATLAB_R2010b.app/extern/include/mex.h:59,
from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:31,
/usr/include/i386/limits.h:75:1: warning: this is the location of the previous definition
In file included from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/usr/local/include/mptk.h:3204: error: 'GP_Pos_Book_c' has not been declared
/usr/local/include/mptk.h:3212: error: 'GP_Param_Book_c' has not been declared
/usr/local/include/mptk.h:3231: error: 'GP_Book_c' has not been declared
/usr/local/include/mptk.h:3232: error: 'GP_Book_c' has not been declared
/usr/local/include/mptk.h:3238: error: 'GP_Param_Book_c' has not been declared
/usr/local/include/mptk.h:3239: error: 'GP_Param_Book_c' has not been declared
In file included from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/usr/local/include/mptk.h:4272: error: 'GP_Block_Book_c' has not been declared
In file included from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/usr/local/include/mptk.h:4914: error: 'GP_Pos_Book_c' has not been declared
/usr/local/include/mptk.h:5109: error: 'GP_Pos_Book_c' has not been declared
/usr/local/include/mptk.h:5244: error: 'GP_Pos_Book_c' has not been declared
/usr/local/include/mptk.h:5680: error: 'GP_Pos_Book_c' has not been declared
In file included from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/usr/local/include/mptk.h:7731: error: ISO C++ forbids declaration of 'GP_Pos_Range_Sub_Book_c' with no type
/usr/local/include/mptk.h:7731: error: 'GP_Pos_Range_Sub_Book_c' declared as a 'virtual' field
/usr/local/include/mptk.h:7731: error: expected ';' before '*' token
In file included from /Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/classes/mptk4matlab.h:32,
/usr/local/include/mptk.h:7990: error: ISO C++ forbids declaration of 'GP_Pos_Range_Sub_Book_c' with no type
/usr/local/include/mptk.h:7990: error: 'GP_Pos_Range_Sub_Book_c' declared as a 'virtual' field
/usr/local/include/mptk.h:7990: error: expected ';' before '*' token
/usr/local/include/mptk.h:7997: error: ISO C++ forbids declaration of 'GP_Pos_Range_Sub_Book_c' with no type
/usr/local/include/mptk.h:7997: error: 'GP_Pos_Range_Sub_Book_c' declared as a 'virtual' field
/usr/local/include/mptk.h:7997: error: expected ';' before '*' token
/usr/local/include/mptk.h:8005: error: ISO C++ forbids declaration of 'GP_Pos_Range_Sub_Book_c' with no type
/usr/local/include/mptk.h:8005: error: 'GP_Pos_Range_Sub_Book_c' declared as a 'virtual' field
/usr/local/include/mptk.h:8005: error: expected ';' before '*' token

mex: compile of ' "/Users/bobs/Aalborg/research/201303/MPTK-Source-0.6.1/src/matlab/bookread.cpp"' failed.

make: *** [all] Error 2
6. WHAT IS GOING ON?!???
7. Ten minutes later, calmly note that my path still has the old version of MPTK.
8. So I delete /usr/local/include/mptk.h: "sudo rm /usr/local/include/mptk.h". Password: "*********"
9. Then I type "make" again; and to install it "sudo make". Password: "****"
Now, let's say I want to create an XCode project.
1. I create a directory "mkdir MPTK061"; and "cd MPTK061".
2. Then I "cmake -G Xcode PATH_TO_SOURCE/MPTK-Source-0.6.1/".
3. Now, I "ccmake .", adjust any settings, configure and generate as above.
4. This creates in the path, "MPTK.xcodeproj", which I open in XCode.
5. I select the ALL BUILD, hit run, and get errors:
-> gcc -O -Wl,-twolevel_namespace -undefined error -arch x86_64 -Wl,-syslibroot,/Developer/SDKs/MacOSX10.6.sdk -mmacosx-version-min=10.6 -bundle -Wl,-exported_symbols_list,/Applications/MATLAB_R2010b.app/extern/lib/maci64/mexFunction.map -o  "/Users/bobs/Aalborg/research/201303/MPTK061/mptk/matlab/dictread.mexmaci64"  /Users/bobs/Aalborg/research/201303/MPTK061/mptk/matlab/dictread.o  -L/Users/bobs/Aalborg/research/201303/MPTK061/lib -lmptk -lmptk4matlab -L/Applications/MATLAB_R2010b.app/bin/maci64 -lmx -lmex -lmat -lstdc++

collect2: ld returned 1 exit status

Command /bin/sh failed with exit code 2

6. This looks to be a linking error, and I currently don't know how to solve it.

### Pages

• Bob L. Sturm: Oh that is a good idea Jort! I think there read more
• Jort Gemmeke: Now only if acrobat somehow had an option to collapse read more
• Igor: When he talks about from the closed to the open read more
• Bob L. Sturm: That would be crazy, academically speaking. I am going to read more
• Alejandro Weinstein: Too bad the author didn't use the Harvard Style for read more
• Laurent Duval: "So [you think] musical fountain need too much cost of read more
• Pardis: That was hilarious! :) read more
• David Friedman: "Symphony" appears from the flowchart to be a default category. read more
• Laurent Duval: This just means to what is called "advanced" in signal read more
• Graham: Vintage recordings with loopy intonations. Nice finds! read more

### Blog Roll

Find recent content on the main index or look in the archives to find all content.