Work in-between

| No Comments
Most of the past year I have unexpectedly devoted to research in music genre recognition. In major part, this comes from a "discovery" that the most widely-used publicly-available dataset for work in this area has repetitions, artist duplication, and mislabelings. I am now putting some finishing touches on my thorough analysis of this dataset and the effects of its faults on the results produced with it. Here is a little graphic from my article. ex01-1.png This shows the highest accuracies reported in all the papers I find testing genre recognition systems using all 10 classes of the GTZAN dataset. The numbers cross-reference the citations in my article (which took me a day to figure out how to do automatically :). The legend shows symbols for works that use two-fold cross validation (2fCV), and so on. The four red "x" are results that are incorrect, e.g., this. The top grey line show the maximum accuracy I estimate (optimistically) when considering the mislabelings in the dataset. If a system scores above it, then it might not be as good as a system that scores below it, with respect to recognizing genre (and we all should know now that classification accuracy is not enough :). The dashed gray line is the maximum accuracy I get using high-performing systems (two get above 83% accuracy on GTZAN) tested on a version of GTZAN missing replicas, and using artist filtering (the same artist does not appear in the training and test sets). Nearly all of the work we find lies between the two lines. And none of the work shown uses an artist filter. Hence, we have over 90 papers that contain a decade of clearly optimistic (and quite possibly wrong) results.

Leave a comment

About this Entry

This page contains a single entry by Bob L. Sturm published on January 4, 2013 2:33 PM.

A Resumé of Year 1 was the previous entry in this blog.

Paper of the Day (Po'D): Multi-tasking with joint semantic spaces is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.