August 2010 Archives

EUSIPCO 2010 update

| No Comments
So far this conference has been great, with a good showing of some heavy hitters in the field of signal processing.

For about an hour, my Mallat number was near zero as I watched from near the front row Stéphane Mallat give an excellent presentation of recent work by one of his PhD students. Of course I didn't understand all of it, but he made me feel as if I did! My take on the work is that it is the creation of a feature space that is invariant to many of the deformations present in, e.g., databases of written digits. His focus here is no longer on perfect reconstruction of a signal, but perfect classification no matter the deformation. The method even has some parallels with quantum mechanics; and the automatic digit recognition rates were convincing, even when random textures were applied to the digits -- which destroy any notion of all digits being efficiently representable as points on some low-dimensional manifold. Since by the end of the talk we had not coauthored a paper, my Mallat number jumped back to around 4 or 5.

Oct. 13, 2010, NB: This "living document" assembles a collection of works that address similarity search in large databases of time-series, starting in 1993 and ending in 2004. I am not so clear what has happened since then --- so references and pointers are appreciated. My briefly annotated timeline of similarity search in audio and music signals is here.
Hello, and welcome to Paper of the Day (Po'D): Minimum Distances in High-Dimensional Musical Feature Spaces Edition. Today's paper is: M. Casey, C. Rhodes, and M. Slaney, "Analysis of minimum distances in high-dimensional musical spaces," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, pp. 1015-1028, July 2008.

In the interest of extreme brevity, here is my one line description of the work in this paper:
With a proper analysis of the distribution of distances in groups of features that are known to be unrelated, and the application of locality sensitive hashing, a range of problems in audio similarity search is essentially nailed in a highly efficient manner (Kudos!).
Hello, and welcome to Paper of the Day (Po'D): Audio Signal Representations for Indexing in the Transform Domain Edition. Today's papers is: E. Ravelli, G. Richard, and L. Daudet, "Audio signal representations for indexing in the transform domain," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, pp. 434-446, Mar. 2010.

In the interest of extreme brevity, here is my one line description of the work in this paper:
Here we see some of the advantages to building descriptive features from multiscale and sparse representations of audio data instead of from transform-domain representations offered by state-of-the-art audio codecs.


| 1 Comment
This weekend we finally got a TV. Imagine my excitement when I turned it on to hear the phenomenal news that, "Scientists have now created a formula that increases the fullness of your eyelashes by 35% over every other product purporting to do the same" --- or something like that. But a 35% increase?

Congratulations eyelash formula scientists!

That is an incredible achievement; I should know because in my line of work I would gain incredible recognition and untold funding dollars with numbers like that, and probably fuller eyelashes. Let's just hope it is true, and it does not go the way of the whole Fleischmann & Pons cold fusion scandal. At least Fleischmann & Pons didn't have Eva Longoria as a spokesperson. Show me the evidence FnP! Just like Longoria's eyelashes. They were practically touching the television camera.

Which reminds me, I think it is fantastic that The Mathworks are getting influential spokespeople for their product. MATLAB slashes the price for students by booking Slash! What is not to love about that?

Hello, and welcome to my first Four-hour Power Hour (not in one sitting), a span of time during which I will attempt to confront and control the near-critical mass of literature not only threatening the structural integrity of my desk, and coaxing my colleagues to call me disposophobic, but also to alleviate impediments in my research acumen. Below, I review and briefly summarize several papers (some of which have moved with me from Paris, and some of which have moved with me from Santa Barbara to Paris and then from Paris, and some of which are completely new). My selection of papers and brief discussion of each are only in context to my present research task (major revision of a submission on audio similarity, search and retrieval in sparse domains), and so any brevity should not to be taken as a reflection of the quality of the work. (And of course, if I have misrepresented anything please let me know and I will correct it.)
Hello, and welcome to Paper of the Day (Po'D): Natural Sounds Segmenting, Indexing, and Retrieval Edition. Today we continue with the theme of working with natural and/or organic sounds: G. Wichern, J. Xue, H. Thornburg, B. Mechtley, and A. Spanias, "Segmentation, indexing, and retrieval for environmental and natural sounds," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, pp. 688-707, Mar. 2010.

In the interest of extreme brevity, here is my one line description of the work in this paper:
Here we find an all-in-one system for automatically segmenting and efficiently searching environmental audio recordings, based in no minor way on probabilistic models of six simple short- and long-term features... very impressive!
Hello, and welcome to Paper of the Day (Po'D): Fingerprinting to Identify Repeated Sound Events Edition. Today's paper is: J. Ogle and D. P. W. Ellis, "Fingerprinting to identify repeated sound events in long-duration personal audio recordings," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., vol. 1, (Honolulu, Hawaii), pp. 233-236, Apr. 2007. This work is essentially a study of the ShazamWow! Fingerprint (Po'D: A. Wang, "An industrial strength audio search algorithm," in Proc. Int. Conf. Music Info. Retrieval, (Baltimore, Maryland, USA), pp. 1-4, Oct. 2003.) applied to not just songs, but audio in general, e.g., environmental sounds like ringing phones, door bells, and other sounds common to technological human existence.

In the interest of extreme brevity, here is my one line description of the work in this paper:
The ShazamWow! Fingerprint works far better in identifying longer sounds with a spectral structure rich in tonal components than for sounds that are short and/or have a simple spectral structure.

Hello, and welcome to Paper of the Day (Po'D): An industrial Strength Audio Search Algorithm Edition. Since I am working to address the critiques of a recently reviewed article of mine, I am performing a thorough literature search on similarity search in audio, among the world of time-series data --- which itself is massive. Among these numerous works is this paper that details (but not completely) what appears to be the correct solution to a thorny problem: A. Wang, "An industrial strength audio search algorithm," in Proc. Int. Conf. Music Info. Retrieval, (Baltimore, Maryland, USA), pp. 1-4, Oct. 2003.

In the interest of extreme brevity, here is my one line description of the work in this paper:
With the ShazamWow! Fingerprint we can perform song search and information retrieval in a scalable and robust manner by comparing hashes of short-time magnitude spectra reduced to high-entropic fingerprints.

Hello, and welcome to Paper of the Day (Po'D): Sound Retrieval and Ranking Using Sparse Auditory Representations Edition. Today's papers are similar to the Po'D a few days ago:
  1. G. Chechik, S. Bengio, E. Ie, D. Lyon, and M. Rehn, "Large-scale content-based audio retrieval from text queries," in Proc. ACM Int. Conf. Multimedia, (Vancouver, BC, Canada), pp. 105-112, Oct. 2008.
  2. R. F. Lyon, M. Rehn, S. Bengio, T. C. Walters, and G. Chechik, "Sound retrieval and ranking using sparse auditory representations," Neural Computation, vol. 22, no. 9, 2010.
I will focus on the second paper since it is an extension of the first.

In the interest of extreme brevity, here is my one line description of the work in this paper:
We can learn links between high-level textual keywords and low-level auditory features using principles of sparse coding, and then use this to retrieve and rank audio data using textual and semantic queries.

Comments Re-enabled

| No Comments
I have re-enabled commenting, but this time with registration. Sorry about that, but the spam was becoming overwhelming.

Otherwise, I am happy to respond to private email correspondence.

Mendeley anyone?

I have recently learned about Mendeley, and after poking around for a bit I have decided to give it a try. I have imported, uploaded and shared my references as a collection called Audio, Music, and Sparsity. It says there are 788 references, but I see several duplicates. There does not appear to be an easy way to remove those.
Woohoo! My paper ("Cyclic Matching Pursuits with Multiscale Time-frequency Dictionaries" with Mads G. Christensen) has been accepted to the 2010 Asilomar Conference on Signals, Systems, and Computers. The following papers are others at this conference with titles that spark my interest.
Hello, and welcome to Paper of the Day (Po'D): Finding Similar Acoustic Events Using Matching Pursuit Edition. Today's paper is one of those few that I have read where it feels like I am looking at the back of the book to see the solution to my odd numbered problem: C. Cotton and D. P. W. Ellis, "Finding similar acoustic events using matching pursuit and locality-sensitive hashing," in Proc. IEEE Workshop App. Signal Process. Audio and Acoustics, (Mohonk, NY), pp. 125-128, Oct. 2009. The reason I say this is because I worked on similar problems last year during my postdoc, and I just received word (peer review of my article, "Audio Similarity in a Multiscale and Sparse Domain," submitted in January) that I must revisit the problem in no minor way --- but more on this in the days and weeks to comes. For reasons that I am not omniscient (Mom always said that it's no fun being a know-it-all, barring certain deities), I have been completely unaware of this work until yesterday; and of course, when I read the title my stomach sank --- but that's just from my piece of Danish dream cake.
Hello, and welcome to Paper of the Day (Po'D): Sparse Coding for Drum Sound Classification Edition. Today's paper will be presented at the 3rd International Workshop on Machine Learning and Music (MML'10): S. Scholler and H. Purwins, "Sparse coding for drum sound classification and its use as a similarity measure," in Proc. Int. Workshop Machine Learning Music ACM Multimedia, (Firenze, Italy), Oct. 2010.
A few Po'Ds ago I discussed the work of Phil Schniter et al: Paper of the Day (Po'D): Fast Bayesian Pursuit Algorithm for Sparse Linear Models Edition. I started fine ended, but I ended in a befuddled mess with a confused discussion of their atom selection criterion simplified by moi involving a singular matrix masquerading as an invertible one --- my bad. Phil has very graciously contributed time to help me understand more clearly this work, and with a link to his MATLAB code --- Huzzah! for reproducible research.
Hello, and welcome to Paper of the Day (Po'D): Sparse Time-relative Auditory Codes and Music Genre Recognition Edition. Today's paper is P.-A. Manzagol, T. Bertine-Mahieux, and D. Eck, "On the use of sparse time-relative auditory codes for music," in Proc. Int. Soc. Music Information Retrieval, (Philadelphia, PA), Sep. 2008.

Blog Roll

About this Archive

This page is an archive of entries from August 2010 listed from newest to oldest.

July 2010 is the previous archive.

September 2010 is the next archive.

Find recent content on the main index or look in the archives to find all content.