Recently in Machine learning Category

Hello, and welcome to the Paper of the Day (Po'D): A Survey of Evaluation in Music Genre Recognition. Today's paper is B. L. Sturm, "A Survey of Evaluation in Music Genre Recognition", Proc. Adaptive Multimedia Retrieval, Copenhagen, Denmark, Oct. 2012.

This paper is best summarized by a particularly riveting line of section 2.2:

The most-used publicly available dataset in music genre recognition work is that produced in [378,379], often called "GTZAN." This audio dataset appears in more than 23% (96) of the references [5,11,14,16,18,27,33,35-40,53,57,58, 84,91,106,107, 109, 114, 130, 131, 136, 138, 142, 143, 163, 164, 177, 182, 191, 199, 201, 202, 204-206, 208, 209, 212-215, 217, 218, 223, 236, 237, 240, 241, 246, 270, 272, 285-290, 314, 318, 319, 322, 323, 325, 331, 336, 337, 339-341, 344, 345, 362-366, 368, 371-374, 377-379, 398,399,402,404,405, 407,411,416].
The numbers just sort of roll off the tongue. I think I might approach the presentation of this paper like at a humanities conference, where I read it. Aloud. With no slides. It is really only 7 pages of text, and 14 pages of references. I can skip the references.

And in the style of Harvard author name and date referencing, here is the first line of my paper:

Despite much work [Abeßer et al., 2008, 2009, 2010, 2012, Ahonen, 2010, Ahrendt et al., 2004, 2005, Ahrendt, 2006, Almoosa et al., 2010, Anan et al., 2011, And ́en and Mallat, 2011, Anglade et al., 2009a,b, 2010, Annesi et al., 2007, Arabi and Lu, 2009, Arenas-Garcia et al., 2006, Ariyaratne and Zhang, 2012, Aryafar and Shokoufandeh, 2011, Aryafar et al., 2012, Aucouturier and Pachet, 2002, 2003, Aucouturier and Pampalk, 2008, Aucouturier, 2009, Avcu et al., 2007, Bagci and Erzin, 2006, Ba ̆gci and Erzin, 2007, Balkema, 2007, Balkema and van der Heijden, 2010, Barbedo and Lopes, 2007, Barbedo, 2008, Barbieri et al., 2010, Barreira et al., 2011, Basili et al., 2004, Behun, 2012, Benetos and Kotropou- los, 2008, 2010, Bergstra et al., 2006, Bergstra, 2006, Bergstra et al., 2010, Bickerstaffe and Makalic, 2003, Bigerelle and Iost, 2000, Blume et al., 2008, Brecheisen et al., 2006, Burred and Lerch, 2003, Burred, 2004, 2005, Burred and Peeters, 2009, Casey et al., 2008, Cataltepe et al., 2007, Chai and Vercoe, 2001, Chang et al., 2008, 2010, Charami et al., 2007, Charbuillet et al., 2011, Chase, 2001, Chen et al., 2006, 2008, 2009, Chen and Chen, 2009, Chen et al., 2010, Chew et al., 2005, Cilibrasi et al., 2004, Cilibrasi and Vitanyi, 2005, Cor- nelis et al., 2010, Correa et al., 2010, Costa et al., 2004, 2011, 2012b,a, Craft et al., 2007, Craft, 2007, Cruz-Alc ́azar and Vidal, 2008, Dannenberg et al., 2001, Dannenberg, 2010, DeCoro et al., 2007, Dehghani and Lovett, 2006, Dellandrea et al., 2005, Deshpande et al., 2001, Dieleman et al., 2011, Diodati and Piazza, 2000, Dixon et al., 2003, 2004, 2010, Doraisamy et al., 2008, Doraisamy and Golzari, 2010, Downie et al., 2005, Downie, 2008, Downie et al., 2010, Draman et al., 2010, 2011, Esmaili et al., 2004, Ezzaidi and Rouat, 2007, Ezzaidi et al., 2009, Fadeev et al., 2009, Fernandez et al., 2011, Fern ́andez and Ch ́avez, 2012, Fiebrink and Fujinaga, 2006, Flexer et al., 2005, 2006, Flexer, 2006, 2007, Flexer and Schnitzer, 2009, 2010, Frederico, 2004, Fu et al., 2010a,b, 2011a,b, Garc ́ıa et al., 2007, Garcia-Garcia et al., 2010, Garc ́ıa et al., 2012, Gedik and Alpkocak, 2006, Genussov and Cohen, 2010, Gjerdingen and Perrott, 2008, Golub, 2000, Golzari et al., 2008a,c,b, Gonz ́alez et al., 2010, Goto et al., 2003, Goulart et al., 2011, 2012, Gouyon et al., 2004, Gouyon and Dixon, 2004, Gouyon, 2005, Grimaldi et al., 2003, 2006, Grosse et al., 2007, Guaus, 2009, Hamel and Eck, 2010, Han et al., 1998, Hansen et al., 2005, Harb et al., 2004, Harb and Chen, 2007, Hartmann, 2011, Heittola, 2003, Henaff et al., 2011, Herkiloglu et al., 2006, de la Higuera et al., 2005, Hillewaere et al., 2012, Holzapfel and Stylianou, 2007, 2008a,b, 2009, Homburg et al., 2005, Honingh and Bod, 2011, Hsieh et al., 2012, Hu and Ogihara, 2012, In ̃esta et al., 2009, ISMIR, 2004, ISMIS, 2011, Izmirli, 2009, Jang et al., 2008, Jennings et al., 2004, Jensen et al., 2006, Jiang et al., 2002, Jin and Bie, 2006, Lu et al., 2009, Jothilakshmi and Kathiresan, 2012, Ju et al., 2010, Kaminskas and Ricci, 2012, Karkavitsas and Tsihrintzis, 2011, 2012, Karydis, 2006, Karydis et al., 2006, Kiernan, 2000, Kim and Cho, 2011, Kini et al., 2011, Kirss, 2007, Kitahara et al., 2008, Kobayakawa and Hoshi, 2011, Koerich and Poitevin, 2005, Kofod and Ortiz-Arroyo, 2008, Kosina, 2002, Kostek et al., 2011, Kotropoulos et al., 2010, Krumhansl, 2010, Kuo and Shan, 2004, Lambrou et al., 1998, Lampropoulos et al., 2005, 2010, 2012, Langlois and Marques, 2009a,b, Lee and Downie, 2004, Lee et al., 2006, 2007, 2008, 2009b,a,c, 2011, Lehn-Schioler et al., 2006, de Leon and Inesta, 2002, de Le ́on and In ̃esta, 2003, 2004, de Leon and Inesta, 2007, de Leon and Martinez, 2012, Levy and Sandler, 2006, Li et al., 2003, Li and Tzanetakis, 2003, Li and Ogihara, 2004, Li and Sleep, 2005, Li and Ogihara, 2005, 2006, Li et al., 2009, 2010, Li and Chan, 2011, Lidy and Rauber, 2003, Lidy, 2003, Lidy and Rauber, 2005, Lidy, 2006, Lidy et al., 2007, Lidy and Rauber, 2008, Lidy et al., 2010b,a, Lim et al., 2011, Lin et al., 2004, Lippens et al., 2004, Liu et al., 2007, 2008, 2009a,b, Lo and Lin, 2010, Loh and Emmanuel, 2006, Lopes et al., 2010, Lukashevich et al., 2009, Lukashevich, 2012, M. et al., 2011, Mace et al., 2011, Manaris et al., 2005, 2008, 2011, Mandel et al., 2006, Manzagol et al., 2008, Markov and Matsui, 2012, Marques and Langlois, 2009, Marques et al., 2010, 2011b,a, Matityaho and Furst, 1995, Mayer et al., 2008b, Mayer and Rauber, 2010a,b, Mayer et al., 2010, Mayer and Rauber, 2011, McKay and Fujinaga, 2004, McKay, 2004, McKay and Fujinaga, 2005, 2006, 2008, McKay, 2010, McKay and Fujinaga, 2010, McKay et al., 2010, McKinney and Breebaart, 2003, Meng et al., 2005, Meng and Shawe- Taylor, 2008, Mierswa and Morik, 2005, MIREX, 2005, 2007, 2008, 2009, 2010, 2011, 2012, Mitra and Wang, 2008, Mitri et al., 2004, Moerchen et al., 2005, 2006, Nagathil et al., 2010, 2011, Nayak and Bhutani, 2011, Neubarth et al., 2011, Neu- mayer and Rauber, 2007, Nie et al., 2009, Nopthaisong and Hasan, 2007, Norowi et al., 2005, Novello et al., 2006, Orio, 2006, Orio et al., 2011, Pampalk et al., 2003, 2005, Pampalk, 2006, Panagakis et al., 2008, 2009a,b, 2010a,b, Panagakis and Kotropoulos, 2010, Paradzinets et al., 2009, Park, 2009a,b, 2010, Park et al., 2011, Peeters, 2007, 2011, In ̃esta and Rizo, 2009, P ́erez et al., 2010, P ́erez-Sancho et al., 2005, P ́erez et al., 2008, Perez et al., 2008, 2009, P ́erez, 2009, Pohle, 2005, Pohle et al., 2006, 2008, 2009, Porter and Neuringer, 1984, Pye, 2000, Rafailidis et al., 2009, Rauber and Fru ̈hwirth, 2001, Rauber et al., 2002, Ravelli et al., 2010, Reed and Lee, 2006, 2007, Rin et al., 2010, Ren and Jang, 2011, 2012, Ribeiro et al., 2012, Rizzi et al., 2008, Rocha, 2011, Rump et al., 2010, Ruppin and Yeshurun, 2006, Salamon et al., 2012, Sanden et al., 2008, 2010, Sanden and Zhang, 2011a,b, Sanden et al., 2012, de los Santos, 2010, Scaringella and Zoia, 2005, Scaringella et al., 2006, Schierz and Budka, 2011, Schindler et al., 2012, Schindler and Rauber, 2012, Seo and Lee, 2011, Seo, 2011, Serra et al., 2011, Seyerlehner, 2010, Seyerlehner et al., 2010, 2011, Shao et al., 2004, Shen et al., 2005, 2006, 2010, Silla et al., 2006, 2007, 2008a,b, Silla and Freitas, 2009, Silla et al., 2009, 2010, Silla and Freitas, 2011, Simsekli, 2010, Soltau, 1997, Soltau et al., 1998, Song et al., 2007, Song and Zhang, 2008, Sonmez, 2005, Sordo et al., 2008, Sotiropoulos et al., 2008, Srinivasan and Kankanhalli, 2004, Sturm and Noorzad, 2012, Sturm, 2012a,b, Sundaram and Narayanan, 2007, Happi Ti- etche et al., 2012, Tsai and Bao, 2010, Tsatsishvili, 2011, Tsunoo et al., 2009a,b, 2011, Turnbull and Elkan, 2005, Typke et al., 2005, Tzagkarakis et al., 2006, Tzanetakis et al., 2001, Tzanetakis and Cook, 2002, Tzanetakis, 2002, Tzanetakis et al., 2003, Umapathy et al., 2005, Valdez and Guevara, 2011, Vatolkin et al., 2010, 2011, Vatolkin, 2012, V ̈olkel et al., 2010, Wang et al., 2008, 2009, 2010, Weihs et al., 2007, Welsh et al., 1999, West and Cox, 2004, 2005, West and Lamere, 2007, West, 2008, Whitman and Smaragdis, 2002, Wiggins, 2009, Wu et al., 2011, Wu ̈lfing and Riedmiller, 2012, Xu et al., 2003, Yang et al., 2011a,b, Yao et al., 2010, Yaslan and Cataltepe, 2006a,b, 2009, Yeh and Yang, 2012, Ying et al., 2012, Yoon et al., 2005, Zanoni et al., 2012, Zeng et al., 2009, Zhang and Zhou, 2003, Zhang et al., 2008, Zhen and Xu, 2010a,b, Zhou et al., 2012, Zhu et al., 2004], music genre recognition (MGR) remains a compelling problem to solve by a machine.


| No Comments
It started when I read the first sentence of the introduction of D. P. L. and K. Surresh, "An optimized feature set for music genre classification based on Support Vector Machine", in Proc. Recent Advances in Intelligent Computational Systems, Sep. 2011. They write:

Music is now so readily accessible in digital form that personal collections can easily exceed the practical limits on the time we have to listen to them: ten thousand music tracks on a personal music device have a total duration of approximately 30 days of continuous audio.
Then I googled "Music is now so readily accessible in digital form", and look at this! The top first hit is from an article in press: Angelina Tzacheva, Dirk Schlingmann, Keith Bell, "Automatic Detection of Emotions with Music Files", Int. J. Social Network Mining, in press 2012. I can't read the entire article; but the first two sentences of the abstract are:

The amount of music files available on the Internet is constantly growing, as well as the access to recordings. Music is now so readily accessible in digital form that personal collections can easily exceed the practical limits of the time we have to listen to them.
The source of this text, however, is in the third search result: M. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes and M. Slaney, "Content-based Music Information Retrieval: Current Directions and Future Challenges", Proc. IEEE, vol. 96, no. 4, pp. 668-696, Apr. 2008. The first sentence of their introduction is an exact match to the text in L. and Suresh:

Music is now so readily accessible in digital form that personal collections can easily exceed the practical limits on the time we have to listen to them: ten thousand music tracks on a personal music device have a total duration of approximately 30 days of continuous audio.
I don't care to search for other examples of plagiarism in this publication, or that of Tzacheva et al. Even finding one lifted sentence in a work tells me how much time I should spend with it. Better for me to just write a blog post about it, and then send a complaint to IEEE.
Hello, and welcome to Paper of the Day (Po'D): Semantic gap?? Schemantic Schmap!! Edition. Today's paper provides an interesting argument for what is necessary to push forward the field of "Music Information Retrieval": G. A. Wiggins, "Semantic gap?? Schemantic Schmap!! Methodological Considerations in the Scientific Study of Music," Proc. IEEE Int. Symp. Mulitmedia, pp. 477-482, San Diego, CA, Dec. 2009.

My one line summary of this work is:
The sampled audio signal is only half of half of half of the story.
I have been experimenting with the approach to feature extraction posed in J. Andén and S. Mallat, "Multiscale scattering for audio classification," Proc. Int. Soc. Music Info. Retrieval, 2011. Specifically, I have substituted these "scattering coefficients" for the features used by Bergstra et al. 2006 into AdaBoost for music genre recognition.

The idea behind the features reminds me of the temporal modulation analysis of Panagakis et al. 2009, which itself comes from S. A. Shamma, "Encoding sound timbre in the auditory system", IETE J. Research, vol. 49, no. 2, pp. 145-156, Mar.-Apr. 2003. One difference is that these scattering coefficients are not psychoacoustically derived, yet they appear just as powerful as those that are.

Computer Music Judges

| No Comments
Just today on BBC News, a nice story about some of the first computerized music judges. More specific details of this interesting event are here.

Here are the winners.

Music genre recognition results

| No Comments
I have finally completed my paper "Three revealing experiments in music genre recognition", submitted to ISMIR 2012, which formalizes my results here, here, here, here, here, and here. I make available here the attendant code for reproducing all experiments and figures.

My one line summary of my work is:
Two of the most accurate systems for automatically recognizing music genre are not recognizing music genre, but something else.
That something else is the subject of further work, not to mention replicating and testing other systems --- starting with this curious work.

Thank you to all commentators and test participants!
I am running some listening tests now, and thought that, maybe, you know, you might like to participate, or something. Your task is to assign the best genre to each of 30 short music excerpts (12 seconds). The ten genres are: Blues, Classical, Country, Disco, Hip hop, Jazz, Metal, Pop, Reggae, and Rock. Your first 10 answers must be correct, otherwise you will not continue to the remaining 20 excerpts. You can listen to each excerpt as many times as you like, but you cannot go back and change your answers. You also must choose one of the 10 genres for each example. Some genres may appear more than others.

If you are a human:
  1. download my software
  2. unzip it, but do not look at the contents!
  3. open MATLAB (I am looking for a way to compile it as a stand-alone program)
  4. in MATLAB change to the "humantest" directory
  5. type "testgui" in the MATLAB command window
  6. put on headphones and take the test
  7. send me the text file placed in the directory, e.g., test_08793.txt, and tell me a little about your experience, e.g., was it easy? Did you enjoy it? Were some excerpts easier to categorize than others? Or maybe the software spit out errors. In that case, send me the errors.
  8. Finally, drag the directory to the trash, and do not repeat test.
Considering that I have a trained music genre classifier --- the same one as I used yesterday --- it would be great to ask of it, "For what are you listening to make a decision?" In my experiment yesterday, I showed that we may simply remaster a single piece of music in one genre to be classified as one of nine others, thus showing the system's decision making strategy to be very fragile, and ultimately illogical. In today's experiment, I am asking the classifier for help in composing musical excerpts that it finds is "definitely" one genre and not the others. To do this, I have built a system that randomly selects and mixes together 3 loops from the 1,198 loops included with Garage Band. These loops include all sorts of drum patterns, piano and guitar comps, brass hits, synthesizer pads, sound effects, etc. After composing an excerpt, the classifier then listens to it and provides a confidence of it belonging to each of the ten genres. If the difference in confidences between the two highest-rated genres, e.g., Metal and Rock, is larger than what it has seen before, I keep the piece, record the difference in the Metal genre, and continue to search for something even more, e.g., Metal, than my previous iterations. In this way, I hope to build the ideal excerpts of each genre, according to the trained classifier, and thereby uncover its idealized models.
Bob Wills & His Texas Playboys is one of the great American Western Swing bands of last century, and have a classic sound that helped define the genre, with two fiddles, steel and electric guitars, and completed by hollering courtesy of Wills. Here is a short excerpt of their dance hit "Big Balls in Cowtown":

Previously, I discussed the dismal performance of two state-of-the-art music genre recognition system when it comes to being trained on hi-fi musical audio, but classifying lo-fi musical audio --- a transformation that unarguably preserves the genre of the underlying music, and to which any system actually using features related to genre should be robust. Then, I discussed specific instances where these systems, even when trained and tested using hi-fi musical audio, make classifications that are "unforgivable", in the sense that ABBA's "Dancing Queen" is Metal, "A whole new world" from Aladdin and "The Lion Sleeps Tonight" are Blues, and "Disco Duck" is Rock. Regardless of their 80% mean accuracies, such behavior tells me that these algorithms act like a child memorizing irrelevant things ("Metal is loud; Classical music is soft"), instead of the stylistic indicators humans actually use to classify and discuss music ("Hip hop often makes use of samples and emphasizes speech more than singing; Blues typically has strophic form built upon 12 bars in common time; Jazz typically emphasizes variations upon themes passed around as solos in small ensembles; Disco is often danceable with a strong regular beat, and emphasizes bass lines, hi-hats, and sexual subjects"). The mean and variances of MFCCs, or modulations in time-frequency power spectra, do not embody characteristics relevant to classifying music genre; and thus these algorithms do not embody anything related to music genre, part 2.

About this Archive

This page is an archive of recent entries in the Machine learning category.

Interaction is the previous category.

Media is the next category.

Find recent content on the main index or look in the archives to find all content.