## Paper of the Day (Po'D): Music descriminations by carp (Cyprinus carpio) edition

| 1 Comment
Hello, and welcome to the Paper of the Day (Po'D): Music descriminations by carp (Cyprinus carpio) edition. Today's paper is an interesting one: A. R. Chase, "Music descriminations by carp (Cyprinus carpio)", Animal Learning & Behavior, vol. 29, no. 4, pp. 336-353, 2001. This paper details a series of experiments aiming to show that fish are capable of categorizing musical sound by genre. I am interested in this paper primarily because its core is about evaluating a "black box" in a complex and human-centered task; and it comes from a scientific discipline in which it is of paramount importance experiments have validity with respect to the scientific questions of interest. Hence, this paper might be relevant for evaluating music information systems (other black boxes) because of its scientific methodology. After a preliminary phase involving training three koi fish to discriminate between musical sound and silence, Chase performs four different experiments to train and test the fish in discriminating musical sound stimuli. The four experiments are designed to answer two specific questions:

1. Can fish learn to discriminate between complex auditory stimuli that humans can also learn to discriminate?
2. If so, does this behavior generalize to novel auditory stimuli satisfying the same discriminability?
While the results of the four experiments are clearly affirmative to both questions, questions remain as to what features the fish are using to perform this task. Two experiments rule out some local features, such as timbre.

As the complex auditory stimuli, Chase uses musical audio. The stimuli used in three experiments are differentiable by humans along genre'' or style'', i.e., blues'' music (recordings of music by John Lee Hooker, Muddy Waters, and Blues compilations) and classical'' music (recordings of music by Bach, Handel, Vivaldi, etc.). The stimuli used in the fourth experiment are differentiable by humans along melody. The results of the first three experiments show three fish capable in learning to discriminate between "blues" and "classical" music. Furthermore, probes with novel stimuli, and an iterative reversal test, argue that the fish were "exhibiting open-ended categorization" of the two kinds of stimuli.

The first two experiments do not show what features of the stimulus are being used by the fish, or whether they are performing the task "on the basis of deeper generic attributes", rather than an unknown discriminant. The third experiment thus attempts to answer whether the fish can learn to discriminate between musical sounds that are labeled either "blues" or "classical" without the cue of timbre. Music by John Lee Hooker, Bach, and Vivaldi were transcribed to MIDI, and rendered with the same and/or different instruments. The results show the fish learn to discriminate between the "styles", even when the signals are produced with the same MIDI instrument; but that performance decreases when the same instruments are used then when they are different. The final experiment attempts to answer whether the fish can learn to discriminate between two different melodies though they have the same rhythm and timbre. The stimuli are a melody of Paganini, and one where the notes are reversed, but the timing is preserved. The one fish completing the experiment shows a capacity to perform this task. Several trials controlled for local features, such as starting and ending notes. Further work is necessary to determine what features the fish are learning to use to perform this task; but the experiments definitely prove fish are able to work with complex auditory stimuli.

I really like how this paper shows the time and effort necessary to scientifically and efficiently answer real questions --- even though it treats musical genre in an artificial way (i.e., Aristotelian categorization). (To criticize this point any further will miss the message of the work.) The experiments it discusses take place over what appears to be at least two years, and which can't really take place over any shorter time-span because living subjects can only be rewarded with so much food at a time. Such a time commitment absolutely requires a solid experimental plan to minimize waste and maximize results. Chase performs four experiments (and a preliminary one), but answers several questions with each experiment, and also uses elements of one experiment to prepare the subjects for the next experiment. The work in this paper exemplifies the kinds of considerations taken for granted in much experimental work in my own disciplines (signal processing and machine learning applied to audio and music signals). Machines need no reward or reinforcement, but why should they be evaluated any differently than Chase evaluates Beauty, Oro and Pepi (the three fish)?

## The final lecture schedule

As my grant is now winding down, I am in the final push to push out a few more articles, and go on an evangelism spree. Watch for when I am in your town!

1. Wednesday Oct. 30, 18h30, OFAI, Vienna
2. Wednesday Nov. 13, 15h30, MTG, Barcelona
3. Monday Nov. 25, 14h00, Télécom ParisTech, Paris
4. Tuesday Nov. 26, all day, AAU, Copenhagen
5. Wednesday Dec. 4, 11h00 Fraunhofer IAIS, Bonn
My current show is called, "The crisis of evaluation in MIR".

Abstract: I critically address the "crisis of evaluation" in music information retrieval (MIR), with particular emphasis paid to music genre recognition, music mood recognition, and autotagging. I demonstrate four things: 1) many published results unknowingly use datasets with faults that render them meaningless; 2) state-of-the-art ("high classification accuracy") systems are fooled by irrelevant factors; 3) most published results are based upon an invalid evaluation design; and 4) a lot of work has unknowingly built, tuned, tested, compared and advertised "horses" instead of solutions. (The example of the horse Clever Hans provides an appropriate illustration.) I argue these problems occur because: 1) many researchers assume a dataset is a good dataset because many others use it; 2) many researchers assume evaluation that is standard in machine learning or information retrieval are useful and relevant for MIR; 3) many researchers mistake systematic, rigorous, and standardized evaluation for being scientific evaluation; and 4) problems and success criteria remain ill-defined, and thus evaluation poor, because researchers do not define appropriate use cases. I show how this "crisis of evaluation" can be addressed by formalizing evaluation in MIR to make clear its aims, parts, design, execution, interpretation, and assumptions. I also present several alternative evaluation approaches that can separate horses from solutions.

## PhD course Nov. 25-29, 2013

Topics in Music Analysis, Cognition and Synthesis,

Doctoral School of Engineering and Science at Aalborg University Copenhagen

The course consists of three parts covering the topics of music analysis, cognition, and synthesis with emphasis on recent advances. The first part covers methods and models for music analysis. This includes statistical methods for parameterization of music signals and methods for music information retrieval. In the second part, models of music perception and cognition are covered, including analysis of symbolic representations of music. Finally, physical models of music instruments are covered in the last part along with parametric and interactive methods for synthesis and spatialization of sounds.

Prerequisites: Basic knowledge of sound and music computing

## Paper of the Day (Po'D): Towards a universal representation for audio information retrieval and analysis Edition

Hello, and welcome to Paper of the Day (Po'D): Towards a universal representation for audio information retrieval and analysis Edition. Today's paper is, B. S. Jensen, R. Troelsgård, J. Larsen, and L. K. Hansen, "Towards a universal representation for audio information retrieval and analysis", Proc. ICASSP, 2013. My one line summary of this article:

A generative multi-modal topic model of music is built from low-level audio features, lyrics, and/or tags.
Essentially, the paper proposes modeling a piece of music by the generative model depicted in a figure from the paper.

For the music signals considered in this paper, there are $$S$$ music data (in the training dataset), $$M$$ modalities (e.g., 3 if using lyrics, tags and audio features), and $$T$$ "topics" (abstractions of the content or stuff in the modalities). For song $$s$$ and modality $$m$$, there are $$N_{sm}$$ "tokens", each of which generates a "word" i.e., the features extracted from that modality of that music. The goal is to model the words of some music data as a random process involving the parameters $$\alpha, \{\beta_m\}$$, and latent variables $$\{\phi_t\}, \{z_{sm}\}$$, and "tokens." This model "generates" each word of song $$s$$ in modality $$m$$ of music by drawing $$\alpha$$ and creating a distribution over the topics, then drawing from this distribution a topic, then drawing a $$\beta_m$$ and creating for the drawn topic a distribution over a vocabulary of "words" in modality $$m$$, and finally drawing a "word" from that distribution. The lead author Bjørn Jensen has given me a quick tutorial in this starting from latent semantic analysis (LSA), moving to probabilistic LSA (pLSA), and ending with latent Dirichlet allocation.

First, LSA. We observe a set of documents $$\{d_i\}$$, and each document is a set of words $$\{w_j\}$$. We might want to discover in our set of documents what topics there are and what words compose the topics. We might want to find relevant documents in our set given a query. Or we might want to be able to predict the topics of an unseen document. So, we build a word co-occurrence matrix $$\MD$$, where each column is a document and each row is a word. Each element is the frequency of a word in a document. We posit that each of our documents is explained by a collection of words (observables) associated with several topics (latent variable). This is then a matrix factorization problem. We can perform PCA, or SVD, or non-negative matrix factorization, to obtain: $$\MD \approx \Phi\Theta$$. Each column of $$\Phi$$ is a topic, and each row denotes a word frequency characteristic of the topic. Each column of $$\Theta$$ describe how a document in our collection is composed by these topics.

Now, pLSA. For our set of documents, what we are really interested in is discovering the joint probability of document-word co-occurrences: $$p(d,\vw)$$, where $$\vw$$ is a vector of word co-occurrences. Assuming that a document is created from topics, and words spring from these topics, and that the document and its words are conditionally independent given a topic, we can express this joint probability as $$p(d,\vw) = \sum_{z\in Z} p(d,\vw|z) p(z) = \sum_{z\in Z} p(d|z) p(\vw|z) p(z) = p(d) \sum_{z\in Z} p(\vw|z) p(z|d)$$ where $$Z$$ is the set of topics. Now, we have to learn from our set of documents the conditional probabilities $$\{p(\vw|z)\}_Z$$ describing the underlying set of topics in terms of the word frequencies, and we have to learn the topical composition of our documents $$\{p(z|d)\}$$. This can be achieved using Markov Chain Monte Carlo (MCMC) methods to discover the distributions that maximize $$p(d,\vw)$$ over our set of documents. (Note to self: review MCMC.) With this model then, we can do some of what we set out to do with LSA: discover in our set of documents what topics there are, what words compose the topics, and what topics are in a given document; or to find relevant documents in our set given a query. However, we cannot compute $$p(d^*,\vw)$$ for a new document $$d^*$$ because we do not know what generates $$p(z|d^*)$$ for this document. By specifying a model of $$p(z|d)$$, we move to LDA.

Now, in LDA we assume the topic distribution $$p(z|d)$$, and perhaps the word distribution $$p(\vw|z)$$, arise from probabilistic models with unknown parameters. The resulting model is a true generative model, in that each word of a document comes from sampling from the sampled topic distribution, and then sampling from a sampled word distribution of that topic. (Note to self: learn what that even means.) With such a model, we can now estimate for a new document, $$p(z|d^*)$$ by a fold-in procedure (Note to self: see previous Note to self.), and thus $$p(d^*,\vw)$$. We can now answer such questions as: how likely is it that this new document was produced by the topics of our model? What are the topics of this new document?

Now, this Po'D considers modeling document co-occurrences with multiple modalities. So, it aims to solve $$p(d,\vw_1, \vw_2, \ldots, \vw_M) = p(d) \sum_{z\in Z} p(z|d) \prod_M p(\vw_m|z)$$ where $$\{\vw_m\}_M$$ is the set of document $$m$$-modality co-occurrences, and the assumption here is that a document is conditionally independent of all modalities given the topics, and that all modalities are independent. This is exactly the model in the figure above. Given a trained model and a new song, one can estimate $$p(z|d^*)$$ by holding all other quantities constant, using a portion of $$d^*$$ ("fold-in"), and sampling using MCMC.

Before I proceed, it is now time to address those notes to myself.

## Formalizing Evaluation in Music Information Retrieval: A Look at the MIREX Automatic Mood Classification Task

One thing I have come to appreciate during the past two years is the necessity to employ formalism. Formalism is a way to see and work with things without ambiguity, to circumvent semantics, to find flaws and avoid them, and to make assumptions clear and their qualification of conclusions. I might have looked at such a sentence two years ago and thought it a senseless piece of self-serving gibberish irrelevant to the way I was working --- which was quite formal, if I may say so! I was using standardized datasets and accepted approaches to systematically evaluate algorithms for music genre recognition, not to mention recovery from compressive sampling. I was even testing for statistical significance!

And things were good; but, I began to see the necessity for something deeper than the standardized and systematic ways in which I was working. I then developed a deep appreciation for analysis; and learned first hand how bad ideas and wasted efforts can be avoided with a little analysis. And then the edifice of my standardized and systematic ways of working was cracked to its foundation.

And things were bad; but then I realized this summer the core problem.

## Paper of the Day (Po'D): Multi-label sparse coding for automatic image annotation Edition

Hello, and welcome to the Paper of the Day (Po'D): Multi-label sparse coding for automatic image annotation Edition. Today's paper is C. Wang, S. Yan, L. Zhang, and H.-J. Zhang, "Multi-label sparse coding for automatic image annotation," in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, pp. 1643-1650, 2009. My one line description of this work is:

Multilabel sparse coding is reported to produce good results.
I will go in reverse from how the paper presents the approach. Finally, we decode the sparse code of a query data. Given a "label matrix" $$\MC$$ of size $$N_c \times N$$ --- where the $$i$$th column denotes which of $$N_c$$ tags are relevant to the $$i$$th training data --- and a solution vector $$\alpha_t$$ from a query data, we find the rows of $$\MC\alpha_t$$ with the largest values. (This is never defined in the paper.) Since each row of $$\MC$$ is associated with one tag, we thereby select those tags relevant to the query.

Penultimately, we produce a sparse code for a query $$\MP\vx_t$$. To do this we find the solution vector $$\alpha_t$$ by solving $$\alpha_l = \arg\min_\alpha \lambda\|\alpha\|_1 + \frac{1}{2}\|\MP\vx_t - [\MP\MX | \MI]\alpha\|_2^2$$ where $$\MX = [\vx_1, \vx_2, \ldots, \vx_N]$$ is the $$N$$ training data, and $$\MP$$ is a projection. ($$\lambda$$ is not defined in the paper.) (Note that $$\alpha_l$$ is long, so we assume we chop off the end so only $$N_c$$ rows remain.)

Antepenultimately, we set $$\MP = \MI$$, or form the projection $$\MP$$ (Wang et al. refer to this as "multilabel linear embedding") by selecting from the eigenvectors of $$\MX\left[\MD - \MW_1 + \frac{\beta}{2}(\MI-\MW_2)^T(\MI-\MW_2)\right]\MX^T$$ where $$[\MD]_{ii} := \ve_i^T\MW_1\mathbf{1} - [\MW_1]_{ii}$$, $$\MW_1$$ and $$\MW_2$$ are "semantic graphs", and $$\beta = 0.1$$ in the paper. Each column of $$\MP^T$$ is an eigenvector of the above matrix, and we keep however many we want to keep. (For the data in the paper, this goes from a space of 40,960, to 1000-2000.)

Preantepenultimately, we create the semantic graphs of the training data in the following way. First, we create the label matrix $$\MC$$ from the train data. The $$i$$th column is non-zero only in the rows associated with the tags of the $$i$$th training vector. Then, all columns of $$\MC$$ are made unit norm. Define $$[\MW_1]_{ij} = 1$$ if $$\vc_i = \vc_j$$, and zero otherwise. Thus, $$\MW_1$$ specify which training data share the same set of tags. Second, the $$i$$th column of matrix $$\MW_2$$ is created in the following way. Remove the $$i$$th column of $$\MC$$ to create $$\MC'$$. Find a sparse representation of the $$i$$th column of $$\MC$$ by solving $$\beta_i = \arg\min_\beta \lambda\|\beta\|_1 + \frac{1}{2}\|\vc_i - [\MC' | \MI]\beta\|_2^2.$$ For $$1 \le j \le i-1$$, set $$[\MW_2]_{ij} = \beta_j$$; and for $$i+1 \le j \le N$$, set $$[\MW_2]_{ij} = \beta_{j-1}$$. ($$\lambda$$ here is not defined in the paper.\) $$\MW_1$$ and $$\MW_2$$ thus attempt to embody how the training data are related in a tag space, or semantically, rather than in the feature space. And so begins the procedure for multi-label sparse coding.

A variant of this approach was adopted for automatic tagging of music signals in, Y. Panagakis, C. Kotropoulos, and G. R. Arce, "Sparse multi-label linear embedding nonnegative tensor factorization for automatic music tagging," in Proc. EUSIPCO, (Aalborg, Denmark), pp. 492-496, Aug. 2010. Instead of posing the sparse representation problems above as a Lagrangian, they are posed as minimization subject to equality constraints. Furthermore, tensors are used rather than supervectors of features.

The empirical results of Wang et al. show that for two image datasets, performance is about the same when $$\MP$$ is designed by the above procedure, or when no "embedding" is done, i.e., $$\MP = \MI$$. Panagakis et al. use the embedding procedure, and report high performance in a music tagging dataset. Without the tensor approach, but still using embedding, the results are still seen to be competitive.

## Deconstructing statistical questions and vacuous papers

D. J. Hand, "Deconstructing statistical questions," J. Royal Statistics Society A (Statistics in Society), vol. 157, no. 3, pp. 317-356, 1994.

This is a remarkable paper, addressing "errors of the third kind": applying a statistical tool to correctly answer the wrong question. This type of error can occur when a research question is not defined in sufficient detail, or worse, when a tool is used simply because it is convenient, and/or gives the result desired. Hand gives many illustrative examples of how things can go very wrong from the beginning, and argues that before proceeding to apply the numerous statistical tools available in software packages today, we all must "deconstruct" with care the scientific and relevant statistical questions that we actually seek to answer.

At the end of the article, there are 24 (mostly) laudatory responses to it --- including one by John Tukey. These are like well-thought comments on reddit, and provide revealing looks at the actual practice of statistics in science, and the practice of science with statistics. One in particular strikes me, because it is about the practice of science with statistics in academia. Donald Preece begins:

Professor Hand speaks of the questions that the researcher wishes to consider. There are often three in number:
1. How do I obtain a statistically significant result?
2. How do I get my paper published?
3. When will I get promoted?
So Professor Hand's suggestions must be supplemented by a recognition of the corruptibility and corruption of the scientific research process. Nor can we overlook the constraints imposed by inevitable limitation of the resources. Needing further financial support, many researchers ask merely 'How do I get results?', meaning by 'results', not answers to questions, but things that are publishable in glossy reports.
This, in particular, hit home, especially after I accidentally read, E. R. Dougherty and L. A. Dalton, "Scientific knowledge is possible with small-sample classification," EURASIP J. Bioinformatics and Systems Biology, vol. 10, 2013. In their recent article, Dougherty and Dalton pull no punches:

Since scientific validity depends on the predictive capacity of a model, while an appropriate classification rule is certainly beneficial to classifier design, epistemologically, the error rate is paramount. ... [A]ny paper that applies an error estimation rule without providing a performance characterization relevant to the data at hand is scientifically vacuous. Given the near universality of vacuous small-sample classification papers in the literature [where error is not estimated], one could easily reach the conclusion that scientific knowledge is impossible in small-sample settings. Of course, this would beg the question of why people are writing vacuous papers and why journals are publishing them.

## A mean lesson about the mean

| 1 Comment
It seems like taking the mean of a sample is not controversial. However, it could be the wrong thing to do. Consider this neat example from D. J. Hand, "Deconstructing statistical questions," J. Royal Statistical Society A (Statistics in Society), vol. 157, no. 3, pp. 317-356, 1994.

An English researcher and French researcher both test two cars of two types to determine which type is the more fuel efficient. One researcher measures miles per gallon, and the other gallons per mile. The following data are collected:

The English researcher finds the average miles per gallon of type 1 cars is greater than that of type 2, so they conclude type 1 is more fuel efficient. However, the French researcher finds the average gallons per mile of type 2 cars is less than that of type 1, so they conclude type 2 is more fuel efficient.

Who is right??

## Paper of the Day (Po'D): Revisiting Inter-Genre Similarity Edition

Hello, and welcome to the Paper of the Day (Po'D): Revisiting Inter-Genre Similarity Edition. Some work from my visits to Portugal earlier this year has finally been given the green light: B. L. Sturm and F. Gouyon, "Revisiting Inter-Genre Similarity", IEEE Signal Processing Letters, 2013 (accepted). My one-line description of this work is:

Be wary of an idea that sounds good and intuitive until analysis shows it to be good.
This paper addresses a former Po'D: Automatic classification of musical genres using inter-genre similarity edition. Our attempts at reproducing the results in that work are here, here, and here. After finding that our results were nowhere near those published, we sought answers through analysis. That is where this paper begins.

In short, we show that while the idea proposed in the original publication sounds good and intuitive, it is plainly not a good idea. (This is, I think, a great example of how intuition can seriously lead one astray.) Once we put the inter-genre similarity approach in the context of naive Bayesian classification, it becomes clear why it can't be superior to the much simpler approach of naive Bayesian classification. We add some empirical experiments to drive home this point. We make available the code to reproduce all figures in our paper exactly.

In fact, it appears that the reviewers put a lot of weight on the pains we took to make our paper reproducible. A few of the reviewers actually dug into some of it to experiment with different parameters. Here are a few comments from reviewers revealing to this point.

Some of the previous reviews [the first version was rejected, and the comment is about our revision and comments to the previous reviews] expressed surprise at the big discrepancy between the results obtained in this paper and the original results and believe that there might be some issue with the implementation. In a case like that I think the reproducible implementation is the one that should be taken seriously.
Overall I think there is a big emphasis on novelty and new results in engineering but reproducibility and repetition of experiment are a central foundation of good science and engineering and papers like this should be encouraged rather than discouraged.
A big pro of the paper at hand is that the authors foster reproducability and even make available their source code. This has already been mentioned by the reviewers, but I would like to highlight and appreciate it again. This is really great practice of good science, but unfortunately not always seen in the signal processing and music-IR domains, unlike in other domains.

## Using MPTK as a library, part 1

I am working on incorporating the MPTK library (0.7.0) into a JUCE application. Everything is fine, until I put only the line "#include " in one of my files. (I am including "-l mptk", and adding to the search paths "/usr/local/includes".) Then, in XCode (4.6.3) I get these errors:

In file included from /Users/bobs/Aalborg/research/201307/mpdgui/mpdgui/Builds/MacOSX/../../Source/MainComponent.cpp:9:
/usr/local/include/mptk.h:1332:4: error: "No FFT implementation was found !"
# error "No FFT implementation was found !"
^
/usr/local/include/mptk.h:3217:78: error: unknown type name 'GP_Pos_Book_c'
MPTK_LIB_EXPORT virtual MP_Support_t update_ip( const MP_Support_t *touch, GP_Pos_Book_c* );
^
/usr/local/include/mptk.h:3225:128: error: unknown type name 'GP_Param_Book_c'
MPTK_LIB_EXPORT virtual void update_frame( unsigned long int frameIdx, MP_Real_t *maxCorr, unsigned long int *maxFilterIdx, GP_Param_Book_c* );

and so on. The first error I can get rid of by "#define HAVE_FFTW3 1" before the include of mptk.h. The second error and others I can get rid of by including in the header search paths, "/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk"

This leads me to my first question: why during the installation of mptk, it creates the libmptk.dylib, and creates and copies the mptk.h, but it does not copy all the other headers (which should already be in mptk.h) into the includes path? Is it expected to have to include the above source?

This doesn't solve all errors. Now, I get the following:

/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:67:3: error: unknown type name 'MP_Signal_c'
MP_Signal_c *s;
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:82:3: error: unknown type name 'MP_Real_t'
MP_Real_t* atomBufferTemp;
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:85:3: error: unknown type name 'MP_Real_t'
MP_Real_t* frameBufferTemp;

and so on. This is strange because "MP_Signal_c" is defined in "mp_signal.h", which is in "/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk". Some of the errors go away if I put at the top of block.h the forward declaration, "class MP_Signal_c;" as well as "#include "mp_types.h". But, then I get new errors:

/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:126:3: error: unknown type name 'MPTK_LIB_EXPORT'
MPTK_LIB_EXPORT virtual int plug_signal( MP_Signal_c *setSignal );
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:126:19: error: expected member name or ';' after declaration specifiers
MPTK_LIB_EXPORT virtual int plug_signal( MP_Signal_c *setSignal );
~~~~~~~~~~~~~~~ ^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:128:3: error: unknown type name 'MPTK_LIB_EXPORT'
MPTK_LIB_EXPORT virtual ~MP_Block_c();

and so on. I can get rid of these errors by "#include "dsp_windows.h" in block.h. But now I get these errors:

/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:163:45: error: unknown type name 'TiXmlElement'
MPTK_LIB_EXPORT bool write_to_xml_element(TiXmlElement * blockElement);
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:171:78: error: unknown type name 'GP_Pos_Book_c'
MPTK_LIB_EXPORT virtual MP_Support_t update_ip( const MP_Support_t *touch, GP_Pos_Book_c* );
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:179:128: error: unknown type name 'GP_Param_Book_c'
MPTK_LIB_EXPORT virtual void update_frame( unsigned long int frameIdx, MP_Real_t *maxCorr, unsigned long int *maxFilterIdx, GP_Param_Book_c* );

and so on. I get rid of these errors when I include in block.h "#include "tinyxml.h", the forward declarations "class GP_Pos_Book_c; class GP_Param_Book_c; class GP_Book_c; class MP_Atom_c; class MP_Dict_c;". Then, I get new errors:

/usr/local/include/mptk.h:4305:39: error: unknown type name 'GP_Block_Book_c'; did you mean 'GP_Pos_Book_c'?
MPTK_LIB_EXPORT MP_Real_t update (GP_Block_Book_c*);
^
/Users/bobs/Aalborg/research/201307/MPTK-Source-0.7.0/src/libmptk/block.h:57:7: note: 'GP_Pos_Book_c' declared here
class GP_Pos_Book_c;
^
In file included from /Users/bobs/Aalborg/research/201307/mpdgui/mpdgui/Builds/MacOSX/../../Source/MainComponent.cpp:12:
/usr/local/include/mptk.h:7790:26: error: unknown type name 'GP_Pos_Range_Sub_Book_c'
MPTK_LIB_EXPORT virtual GP_Pos_Range_Sub_Book_c* get_range_book(unsigned long int minPos,
^
where the problem is now in the installed mptk.h file. I delete the mptk.h in /usr/local/includes, and reinstall MPTK with the changes to the source above. This does not help. So, I see in mptk.h that the header having problems is dict.h. So, to dict.h I add the forward declarations "class GP_Block_Book_c; class GP_Pos_Range_Sub_Book_c;". Then I remove mptk.h from /usr/local/includes, and reinstall it. This takes care of those errors, but then I get four more errors:

/usr/local/include/mptk.h:7803:51: error: return type of virtual function 'get_range_book' is not covariant with the return type of the function it overrides ('GP_Pos_Range_Sub_Book_c' is incomplete)
MPTK_LIB_EXPORT virtual GP_Pos_Range_Sub_Book_c* get_range_book(unsigned long int minPos,
^
/usr/local/include/mptk.h:8062:52: error: return type of virtual function 'get_pos_book' is not covariant with the return type of the function it overrides ('GP_Pos_Range_Sub_Book_c' is incomplete)
MPTK_LIB_EXPORT virtual GP_Pos_Range_Sub_Book_c* get_pos_book(unsigned long int pos);
^
/usr/local/include/mptk.h:8069:52: error: return type of virtual function 'insert_pos_book' is not covariant with the return type of the function it overrides ('GP_Pos_Range_Sub_Book_c' is incomplete)
MPTK_LIB_EXPORT virtual GP_Pos_Range_Sub_Book_c* insert_pos_book(unsigned long int pos);
^
/usr/local/include/mptk.h:8077:52: error: return type of virtual function 'get_range_book' is not covariant with the return type of the function it overrides ('GP_Pos_Range_Sub_Book_c' is incomplete)
MPTK_LIB_EXPORT virtual GP_Pos_Range_Sub_Book_c* get_range_book(unsigned long int minPos,unsigned long int maxPos);

Now, how do I resolve these errors?

I am hoping that these changes will make it more clear how to use the MPTK library in building a new application.

UPDATE: After ripping out all of gp (gradient pursuit), and modifying the relevant CMakeLists.txt, and reinstalling MPTK, I can now "#include " without errors. MPTK-Source-0.7.0.zip

Bob L. Sturm, Associate Professor
Audio Analysis Lab
Aalborg University Copenhagen
A.C. Meyers Vænge 15
DK-2450 Copenahgen SV, Denmark
Email: bst_at_create.aau.dk

### Blog Roll

Find recent content on the main index or look in the archives to find all content.