patch for new feature: acoustic fingerprinting and audio similarity

Tue Aug 12 12:42:15 UTC 2008

On Tuesday 12 August 2008 0:41:17 you wrote:
> So it's MusicBrainz?

No.  The big difference is that the fingerprints calculated by MusicBrainz, 
the old Moodbar, or Last.fm don't attempt to preserve what the song "sounds 
like".   With these fingerprints, if you calculate the similarity between the 
two fingerprints using any kind of mathematical function, you can't be sure 
that two songs that sound similar are going to end up with a greater 
similarity value than two songs that sound wildly different.  In technical 
terms, the metric space defined by these feature vectors (fingerprints) 
doesn't correspond to the metric space of perceived human audio similarity.  
The metric space defined by this new fingerprinting algorithm does.

The fingerprinting that I'm implementing in Amarok is based on an algorithm 
defined by the Marsyas suite of software, which is developed by George 
Tzanetakis.  Marsyas performed very well in last year's MIREX, specifically 
in the Audio Genre Classification and Audio Music Mood Classification tests, 
which are most directly applicable to what I'm trying to accomplish:

http://www.music-ir.org/mirex/2007/index.php/Audio_Genre_Classification_Results
http://www.music-ir.org/mirex/2007/index.php/Audio_Music_Mood_Classification_Results

And most importantly, Marsyas is GPL software that's been under development 
for 10 years.  The holdup for the audio fingerprinting program (as described 
in my announcement email) is that I'm trying to pare down the 8MB Marsyas 
source code core into only what is needed to calculate the fingerprints.  
I've got the fingerprinter working; it just isn't quite in a state that's 
ready for distribution.  I'll let you know when it is.

A program called Mirage has already been written to add similar functionality 
to Banshee, and an GSOC student (Charlotte Curtis) is working on an audio 
similarity project for Rhthymbox -- she also independently decided to use 
Marsyas, and we're sharing fingerprint calculation code.  The MusicIP Mixer 
also implements this kind of functionality, though this program is not open 
source and it lacks many features of Amarok and other audio players.

So, use cases:

1. I'm in the mood for only a certain kind of music, and I think that "Man on 
the Moon" by R.E.M. represents this mood well.  So I go to the dynamic 
playlist editor and add a bias that everything has to be more than 80% 
similar to "Man on the Moon" by R.E.M.  "The Village Green Preservation 
Society" by The Kinks and "We Both Go Down Together" by The Decemberists are 
added to the playlist, but "Killing in the Name Of" by Rage Against the 
Machine and "Mahadeva" by Astral Projection are excluded.

2. I want random shuffle, but I don't want any major jumps between genres; I 
want a "smoothed" shuffle.  So I add a restriction that each track has to be 
at least 75% similar to the previous.  So I start with "Busy Child" by The 
Crystal Method, and the following songs play:

"Take California" by The Propellerheads
"Minefields" by The Prodigy
"Galvanize" by The Chemical Brothers
"Satan" by Orbital
"We Have Explosive" by The Future Sound of London

and so on.  After an hour or so, the shuffle may wander over to "Hey Jude" by 
The Beatles (which is only 15% similar to "Busy Child"), but it got there by 
smoothly transitioning between songs.

2.1: As a perverse subcase, I could force the playlist to jump all over the 
place by specifying that each track added has to be less than 20% similar to 
the preceding.

Both of these *could* be done using only the artists and the "retrieve similar 
artists from Last.fm" function.  But this is only a rough approximation that 
fails to account for the diversity of an artist's music.  For example, using 
the artist similarity system, Moby and Air are about 80% similar, even though 
their respective tracks "That's When I Reach for My Revolver" and "Alone in 
Kyoto" are only about 25% similar.  Such jumps are not really desireable.

Possible use cases that would probably be better implemented as scripts which 
call on the similarity functionality:

3.  30% of my tracks are missing genre tags, and I want to assign them to 
genres, but I don't want to do it by hand.  I run a script that assigns 
tracks to genres based on how similar they are to tracks I've already 
assigned to genres.

4. I want a "wake up" CD for the morning which starts off with "One Perfect 
Sunrise" by Orbital and ends up at "Fuel" by Metallica.  I run a script that 
picks tracks from my collection to fill the 80 minutes with songs that 
transition between the two.

I hope that's enough to explain what I'd like to be able to accomplish, and 
why new functionality is needed.  Your feedback about how I'm currently 
implementing it is much appreciated; I'll make some changes to accommodate 
these suggestions.

-- 
Soren Harward
stharward at gmail.com