# June 2012 Archives

## Wordles of Blues Tags in a music genre dataset

After labeling the Blues portion of this music genre dataset (of 100 excerpts I still have yet to identify 4), I extracted the last.fm tags people have applied to the songs, along with their "counts" --- whatever that means (it is a number from (0 to 100). Then I took these tags and counts and created wordles, which provide a graphical representation of word frequency. The larger the fontsize of a tag, the more frequent it appears in the dataset. Here is the one for the blues category: We can see "Cajun" and "Zydeco" are not so small. Of 100 excerpts, 24 are Cajun and/or Zydeco. I hear ample use of the Blues progression (which is also used in Country and Pop and Rock and ...), and sometimes blue notes, and I know Zydeco blends the Blues tradition with Creole music, but the diatonic (or piano) accordion, the rhythm and tempo, and the French lyrics (when they are there), make me argue it shouldn't be here.

What say people who have tagged these 24 tunes on last.fm? When we create a word cloud from tags applied to only that subset, we obtain this wordle, entirely lacking any mention of "blues", save for a microscopic "bluescruiser" the same size as "boatradio".

## WTF-measure

I am reading a paper that uses the F-measure, and thought I would do a little poking around to get a good feeling for what it embodies. In a detection problem, we have true positives (TP), true negatives (TN), and the errors false alarms, or false positives (FP), and missed detections, or false negatives (FN). The precision of some detection system is defined when $$TP+FP > 0$$ $$P := \frac{TP}{TP+FP}$$ and its recall is defined when $$TP+FN > 0$$ $$R := \frac{TP}{TP+FN}.$$ The F-measure of a detection system is defined $$F := \frac{2PR}{P + R}.$$ Substituting $$P$$ and $$R$$ into this we find $$F = \frac{TP}{|+| - \frac{FN-FP}{2}}$$ where $$|+| := TP+FN$$ the total number of positives in the sample.

Now we can ask what it means to say $$F \le 1/\alpha$$ for $$\alpha \ge 1$$? We see that $$\alpha$$ bounds the true detection rate. If $$F \le 1/\alpha$$ then $$TP < \left \lceil \frac{FN+FP}{2(\alpha - 1)} \right \rceil.$$ At the extremes, when $$\alpha = 1$$, then $$FP = FN = 0$$, or perfect detection and discrimination; and when $$\alpha \to \infty$$, then $$TP = 0$$, or "Is the system even plugged in? Is the light on? Hold this, while I reach around the back."

So, for $$\alpha > 2$$, our detection rate is less than the mean failure rate: $$TP < \frac{FN+FP}{2}.$$

## Formal Logic, pt. 1

| 1 Comment
I am currently writing a paper with some colleagues that has made me realize I must obtain a more solid grounding in the art of mathematical proofs. This involves venturing into the fun world of formal logic, which I am currently doing with the help of Velleman's "How to prove it: A structured approach." In that book, there is an interesting argument, the validity of which I am supposed to test with a truth table. It goes like this:

Either sales or expenses will go up. If sales go up, then the boss will be happy. If expenses go up, then the boss will be unhappy. Therefore, sales and expenses will not both go up.

Bob L. Sturm, Associate Professor
Audio Analysis Lab
Aalborg University Copenhagen
A.C. Meyers Vænge 15
DK-2450 Copenahgen SV, Denmark
Email: bst_at_create.aau.dk