Beginning from my research in music machine listening, I have become more and more aware of applications of machine learning to cultural products, and the pitfalls that accompany such work. I previously critiqued a study applying clustering of image features to photographs of paintings by different artists. Here is a new one: clustering of Shakespeare's plays into genres by word frequencies. (This work is published in: S. Allison, R. Heuser, M. Jockers, F. Moretti and M. Witmore, "Quantitative Formalism: an Experiment", Pamphlets of the Stanford Literary Lab, Jan. 2011.)

On its face, this seems reasonable. As Allison et al. comment, certain words are closely associated with genres, like "castle" with "gothic". However, they discover they are able to automatically and correctly cluster Shakespeare's plays by using frequencies of only 37 words:

"a", "and", "as", "be", "but", "for", "have", "he", "him", "his", "i", "in", "is", "it", "me", "my", "not", "of", "p_apos", "p_colon", "p_comma", "p_exlam", "p_hyphen", "p_period", "p_ques", "p_semi", "so", "that", "the", "this", "thou", "to", "what", "will", "with", "you", "your"

At this point, it is reasonable to pause before making any claim that the clustering -- though correct it may be -- is a result of or caused by genre recognition. To accept such a conclusion entails accepting the words above and their frequencies as the mysterious ingredients that separate "tragedy" from "comedy". Unfortunately, it appears Allison et al. accept just that, calling these word frequency features the observable tips of the "icebergs" that are genres.

Hello, and welcome to Paper of the Day (Po'D): On the epistemological crisis in genomics edition. Today's paper is E. R. Dougherty, "On the epistemological crisis in genomics", Current Genomics, vol 9, pp. 69-79, 2008. (I have discussed a previous paper by Dougherty and Dalton here. )

From its beginning, Dougherty's article is on the attack, and minces no words:


There is an epistemological crisis in genomics. The rules of the scientific game are not being followed. ... High-throughput technologies such as gene-expression microarrays have [led] to the accumulation of massive amounts of data, orders of magnitude in excess to what has heretofore been conceivable. But the accumulation of data does not constitute science, nor does the [a posteriori] rational analysis of data.

Dougherty moves from the ancient to more modern philosophy, highlighting the essential roles in Science played by experiments performed with controlled conditions, the formulation of knowledge through mathematics (models), and the necessity of verification of models through their prediction of data, not their explanation of data. The following paragraph makes this latter quality clearer:

Science is not about data fitting. Consider designing a linear classifier .... The result might be good relative to the assembled data; indeed, [it] might even classify the data perfectly. But this linear-classifier model does not constitute a scientific theory unless there is an error rate associated with the line, predicting the error rate on future observations. ... In practice, the error rate of a classifier is estimated via some error-estimation procedure, so that the validity of the model depends upon this procedure. Specifically, the degree to which one knows the classifier error, which quantifies the predictive capacity of the classifier, depends upon the mathematical properties of the estimation procedure. Absent an understanding of those properties, the results are meaningless.

Dougherty provides a nice illustration of how unreliable such error rates can be. Using real microarray data of genes (independent variables) and tumor types (dependent variable), Dougherty builds and tests several classifiers on subsets of the data, and compares their estimated error rates with their "true error rates" (which is estimated using all of the data). The two appear quite uncorrelated. (A similar example is on Dalton's research webpage.) Dougherty is led to the conclusion that many publications in genomics are "lacking scientific content", and refers to Kant when he remarks, "A good deal of the crisis in genomics turns on a return to 'groping in the dark'."

Since publication, this article appears to have been referenced only 31 times, 19 of which are not from Dougherty and/or Dalton. I look forward to seeing how it has been received in those papers, and its lessons taken into practice. Looks like I will be reading a lot more bioinformatics research.

QMUL, there I go!

| 2 Comments
I am extremely pleased to report that in December I will be moving to the School of Electronic Engineering and Computer Science at Queen Mary University of London! I am really looking forward to joining and contributing to such a leading light in my field.

Now, how to migrate this blog?

This summer I have the opportunity to read more closely R. A. Bailey, Design of comparative experiments. Cambridge University Press, 2008. One thing I really like about her approach is its incorporation of linear algebra and probability theory, which is essentially estimation theory. This provides an unambiguous picture of what is going on in an experiment, the assumptions that are in play, and the relevance and meaning of particular statistical tests. Below, I explicate some of the fundamental subspaces of an experiment.

The program for EUSIPCO 2014 has been announced. Papers of interest for me include:

Comparison of Different Representations Based on Nonlinear Features for Music Genre Classification Athanasia Zlatintsi (National Technical University of Athens, Greece); Petros Maragos (National Technical University of Athens, Greece)

Fast Music Information Retrieval with Indirect Matching Takahiro Hayashi (Niigata University & Department of Information Engineering, Faculty of Engineering, Japan); Nobuaki Ishii (Niigata University, Japan); Masato Yamaguchi (Niigata University, Japan)

Audio Concept Classification with Hierarchical Deep Neural Networks Mirco Ravanelli (Fondazione Bruno Kessler (FBK), Italy); Benjamin Elizalde (ICSI Berkeley, USA); Karl Ni (Lawrence Livermore National Laboratory, USA); Gerald Friedland (International Computer Science Institute, USA)

Unsupervised Learning and Refinement of Rhythmic Patterns for Beat and Downbeat Tracking Florian Krebs (Johannes Kepler University, Linz, Austria); Filip Korzeniowski (Johannes Kepler University, Linz, Austria); Maarten Grachten (Austrian Research Institute for Artificial Intelligence, Austria); Gerhard Widmer (Johannes Kepler University Linz, Austria)

Speech-Music Discrimination: a Deep Learning Perspective Aggelos Pikrakis (University of Piraeus, Greece); Sergios Theodoridis (University of Athens, Greece)

Exploring Superframe Co-occurrence for Acoustic Event Recognition Huy Phan (University of Lübeck, Germany); Alfred Mertins (Institute for Signal and Image Processing, University of Luebeck, Germany)

Detecting Sound Objects in Audio Recordings Anurag Kumar (Carnegie Mellon University, USA); Rita Singh (Carnegie Mellon University, USA); Bhiksha Raj (Carnegie Mellon University, USA)

A Montage Approach to Sound Texture Synthesis Sean O'Leary (IRCAM, France); Axel Roebel (IRCAM, France)

A Compressible Template Protection Scheme for Face Recognition Based on Sparse Representation Yuichi Muraki (Tokyo Metropolitan University, Japan); Masakazu Furukawa (Tokyo Metropolitan University, Japan); Masaaki Fujiyoshi (Tokyo Metropolitan University, Japan); Yoshihide Tonomura (NTT, Japan); Hitoshi Kiya (Tokyo Metropolitan University, Japan)

Sparse Reconstruction of Facial Expressions with Localized Gabor Moments André Mourão (Universidade Nova Lisbon, Portugal); Pedro Borges (Universidade Nova de Lisboa, Portugal); Nuno Correia (Computer Science, Portugal); Joao Magalhaes (Universidade Nova Lisboa, Portugal)

Pornography Detection Using BossaNova Video Descriptor Carlos Caetano (Federal University of Minas Gerais, Brazil); Sandra Avila (University of Campinas, Brazil); Silvio Guimarães (PUC Minas, Brazil); Arnaldo Araújo (Federal University of Minas Gerais, Brazil)

Feature Level Combination for Object Recognition Abdollah Amirkhani-Shahraki (IUST & IranUniversity of Science and Technology, Iran)

Sparse Representation and Least Squares-based Classification in Face Recognition Michael Iliadis (Northwestern University, USA); Leonidas Spinoulas (Northwestern University, USA); Albert S. Berahas (Northwestern University, USA); Haohong Wang (TCL Research America, USA); Aggelos K Katsaggelos (Northwestern University, USA)

Greedy Methods for Simultaneous Sparse Approximation Leila Belmerhnia (CRAN, Université de Lorraine, CNRS, France); El-Hadi Djermoune (CRAN, Nancy-Universite, CNRS, France); David Brie (CRAN, Nancy Université, CNRS, France)

Sparse Matrix Decompositions for Clustering Thomas Blumensath (University of Southampton, United Kingdom)

Evaluation of Non-Linear Combinations of Rescaled Reassigned Spectrograms Maria Sandsten (Lund University, Sweden)

Today, I present a talk at the SoundSoftware 2014 Third Workshop on Software and Data for Audio and Music Research: "How reproducibility tipped the scale toward article acceptance".

I discuss a recent episode in which our submission of a negative result article -- contradicting previously published work -- was favorably reviewed, and eventually published (here). The review process, and the persuasion of the reviewers, were greatly aided by our efforts at reproducibility. We won a reproducibility prize last year for this work.

There appears to be a bevy of good looking papers. I am particularly looking forward to learning more about these:

A COMPOSITIONAL HIERARCHICAL MODEL FOR MUSIC INFORMATION RETRIEVAL

AN ANALYSIS AND EVALUATION OF AUDIO FEATURES FOR MULTITRACK MUSIC MIXTURES

AN ASSOCIATION-BASED APPROACH TO GENRE CLASSIFICATION IN MUSIC

AUTOMATIC INSTRUMENT CLASSIFICATION OF ETHNOMUSICOLOGICAL AUDIO RECORDINGS

CLASSIFYING EEG RECORDINGS OF RHYTHM PERCEPTION

CODEBOOK BASED SCALABLE MUSIC TAGGING WITH POISSON MATRIX FACTORIZATION

DETECTING DROPS IN EDM: CONTENT-BASED APPROACHES TO A SOCIALLY SIGNIFICANT MUSIC EVENT

EVALUATING THE EVALUATION MEASURES FOR BEAT TRACKING

IMPROVING MUSIC RECOMMENDER SYSTEMS: WHAT CAN WE LEARN FROM RESEARCH ON MUSIC TASTES?

INFORMATION-THEORETIC MEASURES OF MUSIC LISTENING BEHAVIOUR

JAMS: A JSON ANNOTATED MUSIC SPECIFICATION FOR REPRODUCIBLE MIR RESEARCH

MIR_EVAL

MODELING TEMPORAL STRUCTURE IN MUSIC FOR EMOTION PREDICTION USING PAIRWISE COMPARISONS

MUSIC CLASSIFICATION BY TRANSDUCTIVE LEARNING USING BIPARTITE HETEROGENEOUS NETWORKS

ON COMPARATIVE STATISTICS FOR LABELLING TASKS: WHAT CAN WE LEARN FROM MIREX ACE 2013?

ON CULTURAL AND EXPERIENTIAL ASPECTS OF MUSIC MOOD

ON INTER-RATER AGREEMENT IN AUDIO MUSIC SIMILARITY

TEN YEARS OF MIREX (MUSIC INFORMATION RETRIEVAL EVALUATION EXCHANGE): REFLECTIONS, CHALLENGES AND OPPORTUNITIES

THEORETICAL FRAMEWORK OF A COMPUTATIONAL MODEL OF AUDITORY MEMORY FOR MUSIC EMOTION RECOGNITION

TRANSFER LEARNING BY SUPERVISED PRE-TRAINING FOR AUDIO-BASED MUSIC CLASSIFICATION

WHAT IS THE EFFECT OF AUDIO QUALITY ON THE ROBUSTNESS OF MFCCS AND CHROMA FEATURES?


A few months ago, I submitted to ISMIR 2014 a paper essentially casting into the MIR conference community my five prescriptions for motivating scientific research in music information retrieval, as well as a summary of the story of Clever Hans. I provocatively titled the paper, ``The future of scientific research in music information retrieval'', the meaning of which comes from the first sentence of the abstract: "We make five prescriptions that can help ensure future research in music information retrieval (MIR) contributes valid (scientific) knowledge." I meant my submission not as a research paper presenting a new MIR system/problem/dataset, but as a position paper, summarizing in one place the major findings from my work and collaborations of the past two years in MIR, and proposing "a way forward" when it comes to what I term "a crisis in MIR evaluation: a large number of published works related to machine music listening (> 500) report results using evaluations that lack the validity for making any meaningful comparisons or conclusions with regards to machine music listening."

The reviews are in (rejection), but the machinery to respond to the comments does not exist. Undoubtedly, the four reviewers spent a good amount of time reviewing and discussing my paper from their perspectives and with a clear ability in the topics. The quality of the reviews (as well as on my other four submissions) is by and large exceptional among the conferences to which I submit, and I very much appreciate the reviewers' efforts. Their comments reveal where my text has fallen short of the goal, which helps me significantly to refine the delivery of the ideas I am advocating. Below, I try to correct these discrepancies in line with the reviewer comments. (I understand this is unconventional; however, I think the discussion is useful for illuminating my five prescriptions just published in JNMR.)

Hello, and welcome to Paper of the Day (Po'D): Intriguing properties of neural networks edition. Today's paper is: C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, "Intriguing properties of neural networks", in Proc. Int. Conf. Learning Representations, 2014. Today's paper is very exciting for me because I see "horses" nearly being called "horses" in a machine learning research domain outside music information retrieval. Furthermore, the arguments that this work is apparently causing resembles what I have received in peer review of my work. For instance, see the comments on this post. Or the reviews here. Some amount of press is also resulting, e.g., ZDnet, Slashdot; and the results of the paper are also being used to bolster the argument that the hottest topic in machine learning is over-hyped.

The one-line precis of this paper is: The deep neural network: as uninterpretable as it ever was; and now acting in ways the contradict notions of generalization.

Hello, and welcome to Paper of the Day (Po'D): Horses and more horeses edition. Today's paper is: B. L. Sturm, "A Simple Method to Determine if a Music Information Retrieval System is a `Horse'", IEEE Trans. Multimedia, 2014 (in press). This double-header of a Po'D also includes this paper: B. L. Sturm, C. Kereliuk, and A. Pikrakis, "A Closer Look at Deep Learning Neural Networks with Low-level Spectral Periodicity Features", Proc. 4th International Workshop on Cognitive Information Processing, June 2014.

The one-line precis of these papers is:
For some use cases, it is important to ensure Music Information Retrieval (MIR) systems are reproducing "ground truth" for the right reasons: Here's how.