# September 2010 Archives

## Summing the entries of a special matrix, pt.2

Continuing on from yesterday, I now reveal the answer for the special case of $$\gamma = 0.5$$. This allows us to use the following expression for the partial sum of a harmonic series $$\sum_{m=1}^{n} \frac{1}{m} = \ln n + \gamma_E + \frac{1}{2n} - \frac{1}{12n^2} + \frac{1}{120n^4} + \mathcal{O}(n^{-6})$$ where $$\gamma_E \approx 0.5772$$ is the Euler-Mascheroni constant. Thus,

## Summing the entries of a special matrix, pt.1

Consider that we have the matrix above, where $$\{ \vh_i \in \mathcal{R}^K : || \vh_i || = 1 \}$$ is a set of $$n$$ atoms from a real dictionary, and each real $$a$$ is a positive weight. We form a matrix of the $$n^2$$ inner products of these $$2n$$ atoms $$\MG_{iq}$$, and the outer-product of the two length-$$n$$ vectors of weights, $$\MA_{iq}$$. The Schur product of these matrices creates a matrix where the sum of its entries gives the inner product between the two vectors $$\langle \vv_{i}, \vv_{q} \rangle$$, where $$\vv_{i} = \sum_{m=1}^n a_{im}\vh_{im}$$ $$\vv_{q} = \sum_{m=1}^n a_{qm}\vh_{qm}.$$ We may not know all of the weights, but we do know that they obey $$0 < a_{m+1} \le a_m \le C m^{-\gamma}, m \in \mathcal{Z}_+$$ for $$C,\gamma > 0$$. Now, the problem:

Estimate $$\langle \vv_{i}, \vv_{q} \rangle$$ to within some precision by summing only a few elements of $$\MA_{iq} \bullet \MG_{iq}$$.

## Comparing Audio Signals in a Sparse Domain

It is push-time for me in the process of revising an article on comparing audio signals in a sparse domain. (Nine days left.) Our main point (at least in today's version) is that sparse and parametric models of data provide flexible ways of comparing and/or classifying the contents in audio signals --- which has been argued before, e.g., just to cite a few:

## A Successful HAID 2010!

On Thursday and Friday of this past week occurred the 2010 Haptic and Audio Interaction Design conference at Aalborg University Copenhagen (AAUK), which brought together dozens of researchers interested in a variety of subjects related to interaction and information visualization through multimodalities. I met researchers from as far away as Qatar, Japan, and even California. The keynote talks were fantastic as well: Dr. Søren Bech (Head of Research at Bang & Olufsen) spoke about measuring user experience with mechanical switches with and without haptic feedback; Professor Vincent Hayward (UPMC Paris 6) presented the high-level results of a large number of experiments with many novel devices that take advantage of the sensations in our fingertips; and Associate professor Davide Rocchesso (IUAV, Italy) presented past and present work investigating the possibilities in using audio (both good and bad uses) to enrich our interactions with everyday objects. The papers, posters, and demos presented showcased just as much a variety of interesting applications and experiments.

The conference was ably chaired by Rolf Nordahl (AAUK), and Professor Stephen Brewster (University of Glasgow, UK). The scientific chairs were Federico Fontana (University of Udine, IT) and Stefania Serafin (AAUK). And the poster and demo sessions, which provided a nice way to have coffee, eat a pastry, and still be learning, were completely organized by assistant professor Sofia Dahl (AAUK) and Kristina Daniliauskaite (AAUK). The conference ran very nicely, the lunches were great, and the social evening at Vesterbro Bryghus was great fun.

Congratulations to all those involved in making HAID 2010 a success! Now I am looking forward to the future challenges of helping organize and host a larger conference at AAUK.

## Common Errors with MPTK Output

A common error I see in interpreting the output of MPTK is the idea that the amplitudes of atoms in a book correspond to the weights of the unit norm atoms most often used in the description of matching pursuit, as well as other greedy iterative descent strategies. Consider the following plot:
Here we see for each atom in a book, the quotient of its energy and the squared value of its amplitude. Were MPTK using exclusively real and continuous atoms and not optimizing the phase using complex atoms, this quotient should always be 1. However, because MPTK builds each best real discrete atom from adjusting the amplitudes of cos and sin atoms (or the phase of a complex atom), the squared atom weight is not always proportional to the atom energy. This choice of keeping the atom amplitude in the book and not its energy makes the signal reconstruction faster because we can skip normalizing each discrete atom.

This optimization is a bit annoying because I want the option to work in a world where the atoms have unit norm. So what I do is use the "-E" option in mpd to save to a file the energy decay of the decomposition process. The first element is the energy of the signal, followed by its energy after the first atom is removed, and so on. Then in MATLAB I use the following code to read it in and find the energies of each unit norm atom:
fid = fopen(energyfilename,'r');
fclose(fid);
alpha = flipud(diff(flipud(booken)));
The variable "alpha" contains the energies of each atom in the order that they were found.

## MPTK 0.5.8: MATLAB utility readbook()

Once I have successfully decomposed a signal, it is time to work with the data in MATLAB. After compiling the function readbook using mex, I can do the following in MATLAB:

This gives me the structure

>> book

book =

format: '0.1'
numAtoms: 4543
numChans: 1
numSamples: 176400
sampleRate: 44100
index: [5x4543 double]
atom: [1x9 struct]

Most of these items are self-explanatory, but what of the last two?

>> book.index(:,1:5)

ans =

1           2           3           4           5
3           6           6           6           6
1           1           2           3           4
1           1           1           1           1
0           0           0           0       16384

What does this mean? Digging around the code in "mxBook.cpp", I see that each row is as follows:

1. MP decomposition iteration
2. atom type index (into the structure book.atom)
3. atom index (into the structure of the structure book.atom)
4. number of atoms selected (1 always in this case since it is not a stereo signal)
5. time position of this atom

So, for the first iteration of the above data we have an atom of the 3rd type, the structure of which we can get by

>> book.atom(3)

ans =

type: 'gabor'
params: [1x1 struct]

The parameters of all the atoms of this type are given by

>> book.atom(3).params

ans =

amp: [894x1 double]
chirp: [894x1 double]
freq: [894x1 double]
len: [894x1 double]
phase: [894x1 double]
pos: [894x1 double]
windowoption: [894x1 double]
windowtype: [894x1 double]

And thus, the first element of these row vectors are the parameters for this first atom found by MP, the frequency of which is

>> book.atom(3).params.freq(1)

ans =

0.0146

where 0.5 is the Nyquist frequency.  Et voilà, nous avons raison en MATLAB.

## MPTK 0.5.8 Installation on Mac OS X 10.6.4

Since June 24, 2010, Matching Pursuit Toolkit version 0.5.8 has been available. So today I decided to upgrade from 0.5.6. In the process I ran into some curious issues. First a little background information on my computer:

My machine, a.k.a., the "Sparse Approximation Station (SpASt)", is a small theater-sized iMac with a 3.33 GHz Intel Core 2 Duo processor, 8 GB ram, and a 2 terabyte hard drive. I am running Snow Leopard 10.6.4.

I love this idea: Future of Music. It is a subversive misappropriation of systems for music recommendation and automatic playlist generation to help you find and suggest you remove data from your music collection based on your tastes. And on top of that, the application is created by a co-founder of a commercial music recommendation business!

Can we make a system that will go through my numerous travel photos and delete the ones that are not composed very well, or that do not have interesting enough subjects? I wonder if a business can be built around this idea.

## Excellent Post-doc opportunity

The METISS team at INRIA Rennes, France, is offering a postdoc position in real-time music source separation in the context of a project with Audionamix, the leading company in source separation (see details below). Applications including a full resume, a letter of motivation and up to three reference letters must be sent by email to the principal investigator before October 1, 2010. Phone interviews of selected candidates will be held during the first week of October.

TITLE: Real-time music source separation
DURATION: 2.5 years
RECRUITMENT DATE: as soon as possible and no later than January 1, 2011
SALARY: according to experience
PRINCIPAL INVESTIGATOR: Emmanuel Vincent (emmanuel.vincent@inria.fr)
CO-PRINCIPAL INVESTIGATOR: Rémi Gribonval (remi.gribonval@inria.fr)

## A Briefly Annotated Timeline of Similarity in Audio Signals

Oct. 26, 2010, NB: This "living document" assembles a collection of works that address similarity search in audio data. Have I forgotten anything so far? My briefly annotated timeline of similarity search in time series is here.

It is difficult to define boundaries on work in and applications of similarity search in audio data, so I restrict the below annotated timeline to work that addresses the following problems:
1. querying a collection of sampled sound signals by a sampled sound query (query by example, fingerprinting);
2. clustering a collection of audio signals based on a measure of similarity.
I do not refer to any work that operates only on metadata supplied by humans, assigned by other sources, or high-level representations, such as MIDI, etc. For instance, I don't consider works describing querying a MIDI database by humming. I am also not considering speech verification, although that is certainly using interesting concepts of similarity.

## Paper of the Day (Po'D): Weighted Voting of Sparse Representation Classifiers for Recognizing Face Emotion Edition

Hello, and welcome to Paper of the Day (Po'D): Weighted Voting of Sparse Representation Classifiers for Recognizing Face Emotion Edition. Today's paper is: S. F. Cotter, "Weighted voting of sparse representation classifiers for facial expression recognition," in Proc. European Signal Process. Conf., (Aalborg, Denmark), pp. 1164-1168, Aug. 2010. This work is an outgrowth of related work:
• S. F. Cotter, "Sparse representation for accurate classification of corrupted and occluded facial expressions," Proc. Int. Conf. Acoustics, Speech, Signal Process., (Dallas, TX, USA), pp. 838-841, Mar. 2010.
• J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, "Robust face recognition via sparse representation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, pp. 210-227, Feb. 2009.
• J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, "Sparse representation for computer vision and pattern recognition," Proc. IEEE, Mar. 2009.
Also of interest are the following related papers:
• P. J. Phillips, "Matching pursuit filters applied to face identification," IEEE Trans. Image Process., vol. 7, pp. 1150-1164, Aug. 1998.
• T. V. Pham and A. Smeulders, "Sparse representation for coarse and fine object recognition," IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, pp. 555-567, Apr. 2006.
And with regards to sparse weighted voting classifiers:
• N. Goldberg and J. Eckstein, "Sparse weighted voting classifier selection and its LP relaxations," Ructor Research Report 9-2010 (Rutgers University, Piscataway, NJ), May 2010.
In the interest of extreme brevity, here is my one line description of the work in this paper:
With stunning accuracy, we can segment a face with and without occlusions into regions, sparsely approximate each region with a dictionary of other faces similarly segmented and labeled with an emotion, and use a simple voting approach to recognize the emotion.

## Some EUSIPCO 2010 Papers

There were a lot of excellent papers at the recent EUSIPCO conference, and it would have been nice to summarize each one on the day I saw it presented; but believe it or not the hotel at which I stayed did not offer free wireless, and so I watched Gordon Ramsey fix yet another problematic restaurant with his "fowl language," and how easy it is to buy antibiotics in some European countries and not others, and several episodes of Seinfeld. And so now I feel refreshed and excited to dig into my stack of new papers. I will review in the coming days the following intriguing papers:

## Paper of the Day (Po'D): Music Genre Classification via Compressive Sampling Edition

Hello, and welcome to Paper of the Day (Po'D): Music Genre Classification via Compressive Sampling Edition. Today's paper is: K. Chang, J.-S. R. Jang, and C. S. Iliopoulos, "Music genre classification via compressive sampling," in Proc. Int. Soc. Music Information Retrieval, (Amsterdam, the Netherlands), pp. 387-392, Aug. 2010.

Bob L. Sturm, Associate Professor
Audio Analysis Lab
Aalborg University Copenhagen
A.C. Meyers Vænge 15
DK-2450 Copenahgen SV, Denmark
Email: bst_at_create.aau.dk