Recently in Music Category

Hello, and welcome to the Paper of the Day (Po'D): A Survey of Evaluation in Music Genre Recognition. Today's paper is B. L. Sturm, "A Survey of Evaluation in Music Genre Recognition", Proc. Adaptive Multimedia Retrieval, Copenhagen, Denmark, Oct. 2012.

This paper is best summarized by a particularly riveting line of section 2.2:

The most-used publicly available dataset in music genre recognition work is that produced in [378,379], often called "GTZAN." This audio dataset appears in more than 23% (96) of the references [5,11,14,16,18,27,33,35-40,53,57,58, 84,91,106,107, 109, 114, 130, 131, 136, 138, 142, 143, 163, 164, 177, 182, 191, 199, 201, 202, 204-206, 208, 209, 212-215, 217, 218, 223, 236, 237, 240, 241, 246, 270, 272, 285-290, 314, 318, 319, 322, 323, 325, 331, 336, 337, 339-341, 344, 345, 362-366, 368, 371-374, 377-379, 398,399,402,404,405, 407,411,416].
The numbers just sort of roll off the tongue. I think I might approach the presentation of this paper like at a humanities conference, where I read it. Aloud. With no slides. It is really only 7 pages of text, and 14 pages of references. I can skip the references.

And in the style of Harvard author name and date referencing, here is the first line of my paper:

Despite much work [Abeßer et al., 2008, 2009, 2010, 2012, Ahonen, 2010, Ahrendt et al., 2004, 2005, Ahrendt, 2006, Almoosa et al., 2010, Anan et al., 2011, And ́en and Mallat, 2011, Anglade et al., 2009a,b, 2010, Annesi et al., 2007, Arabi and Lu, 2009, Arenas-Garcia et al., 2006, Ariyaratne and Zhang, 2012, Aryafar and Shokoufandeh, 2011, Aryafar et al., 2012, Aucouturier and Pachet, 2002, 2003, Aucouturier and Pampalk, 2008, Aucouturier, 2009, Avcu et al., 2007, Bagci and Erzin, 2006, Ba ̆gci and Erzin, 2007, Balkema, 2007, Balkema and van der Heijden, 2010, Barbedo and Lopes, 2007, Barbedo, 2008, Barbieri et al., 2010, Barreira et al., 2011, Basili et al., 2004, Behun, 2012, Benetos and Kotropou- los, 2008, 2010, Bergstra et al., 2006, Bergstra, 2006, Bergstra et al., 2010, Bickerstaffe and Makalic, 2003, Bigerelle and Iost, 2000, Blume et al., 2008, Brecheisen et al., 2006, Burred and Lerch, 2003, Burred, 2004, 2005, Burred and Peeters, 2009, Casey et al., 2008, Cataltepe et al., 2007, Chai and Vercoe, 2001, Chang et al., 2008, 2010, Charami et al., 2007, Charbuillet et al., 2011, Chase, 2001, Chen et al., 2006, 2008, 2009, Chen and Chen, 2009, Chen et al., 2010, Chew et al., 2005, Cilibrasi et al., 2004, Cilibrasi and Vitanyi, 2005, Cor- nelis et al., 2010, Correa et al., 2010, Costa et al., 2004, 2011, 2012b,a, Craft et al., 2007, Craft, 2007, Cruz-Alc ́azar and Vidal, 2008, Dannenberg et al., 2001, Dannenberg, 2010, DeCoro et al., 2007, Dehghani and Lovett, 2006, Dellandrea et al., 2005, Deshpande et al., 2001, Dieleman et al., 2011, Diodati and Piazza, 2000, Dixon et al., 2003, 2004, 2010, Doraisamy et al., 2008, Doraisamy and Golzari, 2010, Downie et al., 2005, Downie, 2008, Downie et al., 2010, Draman et al., 2010, 2011, Esmaili et al., 2004, Ezzaidi and Rouat, 2007, Ezzaidi et al., 2009, Fadeev et al., 2009, Fernandez et al., 2011, Fern ́andez and Ch ́avez, 2012, Fiebrink and Fujinaga, 2006, Flexer et al., 2005, 2006, Flexer, 2006, 2007, Flexer and Schnitzer, 2009, 2010, Frederico, 2004, Fu et al., 2010a,b, 2011a,b, Garc ́ıa et al., 2007, Garcia-Garcia et al., 2010, Garc ́ıa et al., 2012, Gedik and Alpkocak, 2006, Genussov and Cohen, 2010, Gjerdingen and Perrott, 2008, Golub, 2000, Golzari et al., 2008a,c,b, Gonz ́alez et al., 2010, Goto et al., 2003, Goulart et al., 2011, 2012, Gouyon et al., 2004, Gouyon and Dixon, 2004, Gouyon, 2005, Grimaldi et al., 2003, 2006, Grosse et al., 2007, Guaus, 2009, Hamel and Eck, 2010, Han et al., 1998, Hansen et al., 2005, Harb et al., 2004, Harb and Chen, 2007, Hartmann, 2011, Heittola, 2003, Henaff et al., 2011, Herkiloglu et al., 2006, de la Higuera et al., 2005, Hillewaere et al., 2012, Holzapfel and Stylianou, 2007, 2008a,b, 2009, Homburg et al., 2005, Honingh and Bod, 2011, Hsieh et al., 2012, Hu and Ogihara, 2012, In ̃esta et al., 2009, ISMIR, 2004, ISMIS, 2011, Izmirli, 2009, Jang et al., 2008, Jennings et al., 2004, Jensen et al., 2006, Jiang et al., 2002, Jin and Bie, 2006, Lu et al., 2009, Jothilakshmi and Kathiresan, 2012, Ju et al., 2010, Kaminskas and Ricci, 2012, Karkavitsas and Tsihrintzis, 2011, 2012, Karydis, 2006, Karydis et al., 2006, Kiernan, 2000, Kim and Cho, 2011, Kini et al., 2011, Kirss, 2007, Kitahara et al., 2008, Kobayakawa and Hoshi, 2011, Koerich and Poitevin, 2005, Kofod and Ortiz-Arroyo, 2008, Kosina, 2002, Kostek et al., 2011, Kotropoulos et al., 2010, Krumhansl, 2010, Kuo and Shan, 2004, Lambrou et al., 1998, Lampropoulos et al., 2005, 2010, 2012, Langlois and Marques, 2009a,b, Lee and Downie, 2004, Lee et al., 2006, 2007, 2008, 2009b,a,c, 2011, Lehn-Schioler et al., 2006, de Leon and Inesta, 2002, de Le ́on and In ̃esta, 2003, 2004, de Leon and Inesta, 2007, de Leon and Martinez, 2012, Levy and Sandler, 2006, Li et al., 2003, Li and Tzanetakis, 2003, Li and Ogihara, 2004, Li and Sleep, 2005, Li and Ogihara, 2005, 2006, Li et al., 2009, 2010, Li and Chan, 2011, Lidy and Rauber, 2003, Lidy, 2003, Lidy and Rauber, 2005, Lidy, 2006, Lidy et al., 2007, Lidy and Rauber, 2008, Lidy et al., 2010b,a, Lim et al., 2011, Lin et al., 2004, Lippens et al., 2004, Liu et al., 2007, 2008, 2009a,b, Lo and Lin, 2010, Loh and Emmanuel, 2006, Lopes et al., 2010, Lukashevich et al., 2009, Lukashevich, 2012, M. et al., 2011, Mace et al., 2011, Manaris et al., 2005, 2008, 2011, Mandel et al., 2006, Manzagol et al., 2008, Markov and Matsui, 2012, Marques and Langlois, 2009, Marques et al., 2010, 2011b,a, Matityaho and Furst, 1995, Mayer et al., 2008b, Mayer and Rauber, 2010a,b, Mayer et al., 2010, Mayer and Rauber, 2011, McKay and Fujinaga, 2004, McKay, 2004, McKay and Fujinaga, 2005, 2006, 2008, McKay, 2010, McKay and Fujinaga, 2010, McKay et al., 2010, McKinney and Breebaart, 2003, Meng et al., 2005, Meng and Shawe- Taylor, 2008, Mierswa and Morik, 2005, MIREX, 2005, 2007, 2008, 2009, 2010, 2011, 2012, Mitra and Wang, 2008, Mitri et al., 2004, Moerchen et al., 2005, 2006, Nagathil et al., 2010, 2011, Nayak and Bhutani, 2011, Neubarth et al., 2011, Neu- mayer and Rauber, 2007, Nie et al., 2009, Nopthaisong and Hasan, 2007, Norowi et al., 2005, Novello et al., 2006, Orio, 2006, Orio et al., 2011, Pampalk et al., 2003, 2005, Pampalk, 2006, Panagakis et al., 2008, 2009a,b, 2010a,b, Panagakis and Kotropoulos, 2010, Paradzinets et al., 2009, Park, 2009a,b, 2010, Park et al., 2011, Peeters, 2007, 2011, In ̃esta and Rizo, 2009, P ́erez et al., 2010, P ́erez-Sancho et al., 2005, P ́erez et al., 2008, Perez et al., 2008, 2009, P ́erez, 2009, Pohle, 2005, Pohle et al., 2006, 2008, 2009, Porter and Neuringer, 1984, Pye, 2000, Rafailidis et al., 2009, Rauber and Fru ̈hwirth, 2001, Rauber et al., 2002, Ravelli et al., 2010, Reed and Lee, 2006, 2007, Rin et al., 2010, Ren and Jang, 2011, 2012, Ribeiro et al., 2012, Rizzi et al., 2008, Rocha, 2011, Rump et al., 2010, Ruppin and Yeshurun, 2006, Salamon et al., 2012, Sanden et al., 2008, 2010, Sanden and Zhang, 2011a,b, Sanden et al., 2012, de los Santos, 2010, Scaringella and Zoia, 2005, Scaringella et al., 2006, Schierz and Budka, 2011, Schindler et al., 2012, Schindler and Rauber, 2012, Seo and Lee, 2011, Seo, 2011, Serra et al., 2011, Seyerlehner, 2010, Seyerlehner et al., 2010, 2011, Shao et al., 2004, Shen et al., 2005, 2006, 2010, Silla et al., 2006, 2007, 2008a,b, Silla and Freitas, 2009, Silla et al., 2009, 2010, Silla and Freitas, 2011, Simsekli, 2010, Soltau, 1997, Soltau et al., 1998, Song et al., 2007, Song and Zhang, 2008, Sonmez, 2005, Sordo et al., 2008, Sotiropoulos et al., 2008, Srinivasan and Kankanhalli, 2004, Sturm and Noorzad, 2012, Sturm, 2012a,b, Sundaram and Narayanan, 2007, Happi Ti- etche et al., 2012, Tsai and Bao, 2010, Tsatsishvili, 2011, Tsunoo et al., 2009a,b, 2011, Turnbull and Elkan, 2005, Typke et al., 2005, Tzagkarakis et al., 2006, Tzanetakis et al., 2001, Tzanetakis and Cook, 2002, Tzanetakis, 2002, Tzanetakis et al., 2003, Umapathy et al., 2005, Valdez and Guevara, 2011, Vatolkin et al., 2010, 2011, Vatolkin, 2012, V ̈olkel et al., 2010, Wang et al., 2008, 2009, 2010, Weihs et al., 2007, Welsh et al., 1999, West and Cox, 2004, 2005, West and Lamere, 2007, West, 2008, Whitman and Smaragdis, 2002, Wiggins, 2009, Wu et al., 2011, Wu ̈lfing and Riedmiller, 2012, Xu et al., 2003, Yang et al., 2011a,b, Yao et al., 2010, Yaslan and Cataltepe, 2006a,b, 2009, Yeh and Yang, 2012, Ying et al., 2012, Yoon et al., 2005, Zanoni et al., 2012, Zeng et al., 2009, Zhang and Zhou, 2003, Zhang et al., 2008, Zhen and Xu, 2010a,b, Zhou et al., 2012, Zhu et al., 2004], music genre recognition (MGR) remains a compelling problem to solve by a machine.

A misguided study?

| No Comments
H. Jennings, P. Ivanov, A. Martins, P. da Silva, and G. Viswanathan, "Variance fluctuations in nonstationary time series: a comparative study of music genres," Physica A: Statistical and Theoretical Physics, vol. 336, pp. 585-594, May 2004.

In classic physics style, this work essentially reduces the music signal to be an amplitude envelope, and then claims truths on entire genres based on correlations.

The Medium Shapes the Message

| No Comments
Here is a nice article by David Byrne on how the "venue" shapes his sound, as well as that of many other artists, including birds. It reminds me of this fantastic course I took in graduate school about how minimalism (in music) arose and developed in part from the dawn of the LP and hi-fidelity.
The paper, R. B. Dannenberg, B. Thom, and D. Watson, "A machine learning approach to musical style recognition," in Proc. International Computer Music Conf., Thessaloniki, Greece, Sep. 1997, is regarded as the first to explore something like recognizing the genre of a musical signal. It proposes a system to determine the playing style of a musician. However, I have just discovered the following fascinating paper: K.-P. Han, Y.-S. Park, S.-G. Jeon, G.-C. Lee, and Y.-H. Ha, "Genre classification system of TV sound signals based on a spectrogram analysis," IEEE Transactions on Consumer Electronics, vol. 44, pp. 33-42, Feb. 1998. In that paper, they look at discriminating between speech and music, and Jazz, Classical and Popular genres. Not only do they simulate the algorithm, they actually implement the system using circuits and show the results. They also list the musical pieces they put in each genre dataset. Was Kansas Popular in 1998?

Music genre flowchart

| 1 Comment
flow.png From: T. Zhang, "Semi-automatic approach for music classification," in Proc. SPIE Conf. on Internet Multimedia Management Systems, 2003.

The authors put together a flowchart for automatic classification. I was curious about "detect features of symphony", especially when one only has a 30 second clip: "Since a symphony is composed of multiple movements and repetitions, there is an alternation between relatively high volume audio signal (e.g. performance of the whole orchestra) and low volume audio signal (e.g. performance of single instrument or a few instruments of the orchestra) along the music piece. ... Thus, by checking the existence of alternation between high volume and low volume intervals (with each interval longer than a certain threshold) and/or repetition(s) in the whole music piece, symphonies will be distinguished [from other genres]."

Props to the authors for attempting the impossible, but any flowchart for assigning music genre must be broken from the very first decision. Genres are not uniquely specified by characteristics that mutually exclude others.

Music genre taxonomy

| No Comments
genretax.png From: J. G. A. Barbedo and A. Lopes, "Automatic genre classification of musical signals," EURASIP Journal on Advances in Signal Processing, 2007.

The authors specify the meaning of each of these labels. For instance, "Dance" music has "strong percussive elements and very marked beating." Stemming from "Dance" there is "Jazz", "characterized by the predominance of instruments like piano and saxophone. Electric guitars and drums can also be present; vocals, when present, are very characteristic." And stemming from "Dance," stemming from "Jazz," there is "Cool", a "jazz style [that is] light and introspective, with a very slow rhythm." The genres "Techno" and "Disco" --- which both emphasize the importance of listening with your body and feet --- do not stem from "Dance," but instead from "Pop/Rock," "the largest class, including a wide variety of songs."

Props to the authors for attempting the impossible, but any taxonomy of music genre must be broken from the very first stem. Genres are not like species, and cannot be arranged like so. (On the plus side, it appears that to differentiate introspective music from non-introspective music requires only four spectral features computed over 21.3 ms windows.)
From 1976, "Daddy Cool" by Boney M. is a certifiable classic Disco tune. Below are Boney M. singing and dancing "Daddy Cool." And I think Bobby Ferrell's dancing might be a perfect reflection of cocaine use.

To me this is nearly perfect Disco: a square four on the floor with that typical open hi-hat between beats (this time with a flange!), simple yet memorable figures for strings and saxes, bass, female voices, and don't forget the sexual content! The only thing really wrong with it is that the track is no longer than 4 minutes. (And I would like a funkier bass-line.)

Now consider this 1993 remake by a pop group in Hungary.

Although note for note they are just about the same, to me from the get go the latter is a sequenced and sterile version missing the essential hihats of the original. But is it so far away that I would not classify it as Disco?
Here is Faron Young in 1956 covering Don Gibson's "Sweet Dreams" for him and his eyebrows to get a ticket for the Checkerboard Showboat to continue "following those girls."

Now, here are the Pioneers covering the song 12 years later in a recording released 1968 (is that a ukulele I hear?).

When I listen to this version, my own eyebrows raise as if to reach across space and time some 44 years to bring the vocals into key. It is precisely because of this, in our autotune saturated world, that I really like this recording.

Here is Jim Reeves in 1959 singing "He'll Have To Go", this time without the pressure of enduring the Checkerboard Showboat.

Now, here is David Isaacs covering the song 10 years later in a recording released 1969.

Those back up singers, with their bizarre harmonies intentional or not, are precisely why I can't stand to listen to this recording at a low volume. My wife rolls her eyes when I play it loud, just as I want to always hear it.

Now, from his 1974 masterful record "Rhapsody in White," here is "Love's Theme" by Barry White performed by The Love Unlimited Orchestra.

Aside from the rich orchestration combining two rhythm guitars, roiling piano, lush strings, sweeping harp, and the drums and bass I could listen to four hours alone, I love this particular recording for a few reasons. First, popular music these days that combine classical elements like strings, is essentially boring. I am looking at you The Verve, and Guns 'N Roses. Second, around 1m45, when the horns take in the bridge, there is a wonderful maybe-flub by one of the players. Then from 3m07 to 3m11 the piano loses it, before nearly everything is taken away by a quite artificial but delicious rapid fade out at 3m16, leaving naked the rhythm guitars shivering alone with the bass and drums.


Disco in Bulgaria

| No Comments
Disco --- the music, the dress, the life style --- was a phenomenon that has a clear beginning, peak, and denouement, at least in the USA and the UK. Contrary to the hundreds of "Now that is what I call Disco" compilations available, Disco made inroads to many other places in the world --- places other than Western Europe and Scandinavia (ABBA).

I have been communicating with a colleague (NN) who is an expert in Bulgarian popular music, and he has graciously given me permission to quote our conversation. I indent his notes below.
This is one of the best mash-ups I have seen. We need so much more of this.

If you are wondering, that mad piano-playing dancing man is Neil Sedaka singing "Bad Blood":

The second singer is Teddy Pendergrass singing "Close the door":

The man in the beginning is Bob McGrath from Sesame Street. Seeing him takes me back to my childhood when I was an avid watcher. :)

About this Archive

This page is an archive of recent entries in the Music category.

Media is the previous category.

Probability is the next category.

Find recent content on the main index or look in the archives to find all content.