October 2010 Archives

Paper of the Day (Po'D): Music Cover Song Identification Edition, pt. 2

Hello, and welcome to Paper of the Day (Po'D): Music Cover Song Identification Edition, pt. 2. We continue today with a paper mentioned in the post from yesterday: J. Serrà, E. Gómez, P. Herrera, and X. Serra, "Chroma binary similarity and local alignment applied to cover song identification," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1138-1151, Aug. 2008.

Papers of the Day (Po'D): Music Cover Song Identification Edition, pt. 1

Hello, and welcome to Papers of the Day (Po'D): Music Cover Song Identification Edition, pt. 1. I am currently looking methods of measuring music similarity for cover song identification. Previously at CRISSP, I have discussed music fingerprinting, e.g., the Shazam-Wow! approach Paper of the Day (Po'D): An industrial Strength Audio Search Algorithm Edition. This particular approach uses features that are much too specific to be of any use to be insensitive to the wealth of possible transformations when a song "goes under cover." For instance, see my examples here: Un Homme et Une Femme: Audio Fingerprinting and Matching in MATLAB. I have also discussed other approaches that attempt to generate features of varied specificity for similarity calculation, as in Paper of the Day (Po'D): Minimum Distances in High-Dimensional Musical Feature Spaces Edition. I note many other approaches at my big A Briefly Annotated Timeline of Similarity in Audio Signals. With today's papers, we add a few more to the stack:

1. D. P. W. Ellis and G. E. Poliner, "Identifying 'cover songs' with chroma features and dynamic programming beat tracking," in Proc. Int. Conf. Acoustics, Speech, Signal Process., (Honolulu, Hawaii), Apr. 2007.
2. S. Ravuri and D. P. W. Ellis, "Cover song detection: from high scores to general classification," in Proc. Int. Conf. Acoustics, Speech, Signal Process., (Dallas, TX), Mar. 2010.

Papers of the Day (Po'D): Concatenative Synthesis Edition

Hello, and welcome to Papers of the Day (Po'D): Concatenative Synthesis Edition. Today's papers are:
1. G. Coleman, E. Maestre and J. Bonada, "Augmenting Sound Mosaicing with descriptor-driven transformation," in Proc. Digital Audio Effects, Graz, Austria, Sep. 2010.
2. M. D. Hoffman, P. R. Cook and D. M. Blei, "Bayesian spectral matching: Turning Young MC into MC hammer via MCMC sampling," in Proc. Int. Computer Music Conf., Montreal, Canada, Aug. 2009.

Processing for Artificial Intelligence Programming?

This coming Spring I will teach artificial intelligence (AI) to the last year bachelor students. AI includes things like path finding (Viterbi, Astar, Djikstra), steering behaviors (flocking, pursuing, avoiding, collision, etc.), decision making (finite state machines, decision trees, Markov chains), and maybe a little more advanced stuff (rule learning, pattern recognition, natural language processing, neural networks, genetic algorithms). As much as possible, I want to avoid wasting energy and motivation with hard core C++, and the expense and limited licenses we have of MATLAB. What I need is a language that is simple yet has a high payout, and provides a cross-platform environment for quick and dirty cowboy coding without compiler linking errors. So today I experimented with Processing as a possible platform upon which to motivate the various topics of AI. As a bonus, I see that several libraries have already been developed in these areas: Below is a little something I created by adapting code from a tutorial to create a random walk. (Click on it to start the ball at a new position.) After only a few hours, I have found Processing to be quick and easy, with a syntax not too far from C++. This may be exactly what I need for half of the class.

Source code: randomwalkdraw

Built with Processing

Medialogy Paper Awarded Top 10%!

Congratulations to my colleagues for their paper awarded the top 10% at the 2010 IEEE International Workshop on Multimedia Signal Processing: L. Turchet, R. Nordahl, S. Serafin, A. Berrezag, S. Dimitrov, and V. Hayward, "Audio-haptic physically based simulation of walking sounds", IEEE Int. Work. Multimedia Signal Process., St. Malo, France, Oct. 2010.

In this work, the authors describe their efforts at simulating the physical and acoustic sensations of walking on different materials, such as creaky wood, hard or soft snow, pebbles, and concrete. They have installed pressure sensors and actuators into a pair of sandals, which are interfaced with a computer that transmits haptic and auditory feedback based on physical models of the materials. The feedback is controlled by the foot pressure. I have tested an early prototype, and found that creaky wood and hard snow were very realistic. With this work, we "step" closer to a virtual reality that is less virtual. Kudos!

Interactive Systems Programming

I just finished teaching "Interactive Systems Programming" --- one of the graduate courses in the Medialogy section of the Department of Architecture, Design and Media Technology at Aalborg University Copenhagen. Any set of lectures and exercises I could design would not have been of interest to most of the students because the subject matter is so broad, not to mention that it completely spans theory to practice. (I specialize in audio and music, and so my presentation would have been extremely unbalanced too.) Instead I had each student select one research paper in his or her field of interest related in some way to interactivity, and to making the human computer interaction more natural. Over the course of the course, I helped each student read his or her paper, write a critical annotation of it, and attempt to reproduce some aspect of the paper --- whether it is building an interface, running the experiment, testing an algorithm, etc. On the final day the students presented their papers and results. The selected papers are extremely varied, and give a good glimpse at state-of-the-art work (10 of 15 papers are from 2008 and later). Since I spent time with each of the papers over the past month, I write below my one line description of the work in each.

CFP: 8th Sound and Music Computing (SMC) Conference 2011

8th Sound and Music Computing Conference, 06-09 July 2011
Department of Information Engineering, University of Padova
http://smc2011.smcnetwork.org/

The SMC Conference is the forum for international exchanges around the
core interdisciplinary topics of Sound and Music Computing. SMC 2011
will feature lectures, posters/demos, musical/sonic works, and other
satellite events. The SMC Summer School will take place just before the
Conference and it will aim at giving an opportunity to young researchers
interested in the field to learn about some of the core interdisciplinary
topics and to share their own experiences with other young researchers

================Important dates=================
Deadline for submissions of papers and music: Friday 25 March, 2011
Deadline for applications to Summer School: Friday 25 March, 2011
Notification of acceptance to Summer School: Monday 18 April, 2011
Notification of paper and music acceptances: Friday 6 May, 2011
SMC 2011 Summer School: Saturday 2 - Tuesday 5 July, 2011
SMC 2011 Satellite Events: Wednesday 6 July, 2011
SMC 2011 Conference: Thursday 07 - Saturday 09 July, 2011
===========================================

Papers of the Day (Po'D): Finding or Not Finding Rules in Time Series Edition

Hello, and welcome to Papers of the Day (Po'D): Finding or Not Finding Rules in Time Series Edition. I found today's papers by accident when I was searching for texts to use in my forthcoming class, "Artificial Intelligence Programming." In our tubular library I found the once-opened book, "Advances in Econometrics, vol. 19 (2004): Applications of Artificial Intelligence in Finance and Economics." I thought this would provide some keen examples, since kids these days are interested in making money. (I have written one lab for the students where they apply linear prediction to modeling stock prices.) In this book, however, I found the paper: J. Lin and E. Keogh, "Finding or Not Finding Rules in Time Series." I was hooked from one of its first sentences: "In this work we make a surprising claim. Clustering of time series subsequences is meaningless!" What follows is what I call, "time series analysis cage fighting at its best."

In the interest of extreme brevity, here is my one line description of the work in this paper, in my own words:
You want to know what's meaningless? Clustering subsequences of time series, that's what.

Sound Quality Seminar

Yesterday I attended a half-day seminar at Widex A/S on modeling and measuring sound quality organized by the Audio Signal Processing Network in Denmark. There were four hour-long talks split between measuring sound quality and modeling sound quality with a focus on hearing impairment. (And I arrived a bit late because the taxi took me to Lyngby instead of Lynge.)

Postdoctoral Research Assistant: Sparse Representations for Audio Signals

This postdoc offer looks incredible! C4DM at QMUL is one of the research leaders in sparse representation.

-----

Centre for Digital Music: Queen Mary University of London

Greedy Decomposition of Stationary Signals

Consider decomposing a real sinusoid using matching pursuit (MP), cyclic MP (CMP), or orthogonal MP (OMP) and a dictionary of smooth time-frequency atoms built from, e.g., Hann windows of various scales $$s \in \{ 4, 8, 16, 32, 64, 128, 256 \}$$ samples, each skipped by one quarter their scale. To create an atom we modulate a window of scale $$s$$ by complex phasor with a normalized frequency $$\omega \in \{ k/s : k = 0, 1, \ldots, \lfloor s/2 \rfloor + 1\}$$. To be sure we have a complete dictionary we include the Dirac basis. Real atoms are built from complex atoms by adding their conjugate; and optimal phases are phound by phrojection on the dual atom and its conjugate (if not real). Now, with that out of the way, it is at first surprising to see the behavior of these three greedy methods. Depending on its frequency and phase, a stationary sinusoid can be decomposed into a mess of atoms, producing a model that lacks sense.

Un Homme et Une Femme: Audio Fingerprinting and Matching in MATLAB

At my software page I have made available a nice demonstration of audio fingerprinting using a Shazam-like approach. I have discussed this approach in Paper of the Day (Po'D): An industrial Strength Audio Search Algorithm Edition, which refers to the following paper: A. Wang, "An industrial strength audio search algorithm," in Proc. Int. Conf. Music Info. Retrieval, (Baltimore, Maryland, USA), pp. 1-4, Oct. 2003. In my code example I use 56 different versions of the music theme from the 1966 French film "Un Homme et Une Femme", composed by Francis Lai --- one of the the great French film composers of last century. I spent a fun Sunday afternoon collecting all of these versions from YouTube. And now two days later my wife and I are suffering from an insufferable earworm.

Paper of the Day (Po'D): Recursive Nearest Neighbor Search in a Sparse Domain Applied to Comparing Audio Signals Edition

Hello, and welcome to Paper of the Day (Po'D): Recursive Nearest Neighbor Search in a Sparse Domain Applied to Comparing Audio Signals Edition. Today's paper is our just-yesterday-submitted revised article: B. Sturm and L. Daudet, "Recursive Nearest Neighbor Search in a Sparse and Multiscale Domain for Comparing Audio Signals," submitted to Signal Processing, Sep. 2010. And with that revision done, I had my first good night's rest last night. For past entries related to this revision, see the past two months: In the interest of extreme brevity, here is my one line description of the work in this paper:
A previously proposed approach for nearest neighbor search in a sparse domain is not practical for comparing audio signals, but its general framework produces surprising yet sensible results.

Bob L. Sturm, Associate Professor
Audio Analysis Lab
Aalborg University Copenhagen
A.C. Meyers Vænge 15
DK-2450 Copenahgen SV, Denmark
Email: bst_at_create.aau.dk