patch for new feature: acoustic fingerprinting and audio similarity
Jeff Mitchell
kde-dev at emailgoeshere.com
Thu Aug 14 20:36:50 UTC 2008
On Thursday 14 August 2008, Soren Harward wrote:
> On Tue, Aug 12, 2008 at 1:53 AM, Jeff Mitchell
>
> <kde-dev at emailgoeshere.com> wrote:
> > -- You appear to calculate the fingerprint for a track when the track is
> > first accessed. Although you are caching it in the database, this is
> > likely to be a very long process. So forcing it on the user is a no-no.
>
> Okay, I've been thinking about how best to handle scanning files to
> calculate the fingerprints. I agree that the "calculate on load" is a
> bad idea. So I wrote a separate program, modeled on the
> collectionscanner, which takes a list of files and calculates
> fingerprints for them, writing the results as an XML file on STDOUT.
> Now I'm trying to figure out how to best get that data into Amarok.
> As I think about it, there are a couple of different ways I could do
> this:
>
> 1. Integrate the fingerprinting algorithm into collectionscanner, so
> that the fingerprint is just one more XML field in the result. This
> would be the easiest thing to do, but it makes the collection scanner
> very slow, and it recalculates fingerprints for files that already
> have them.
>
> 2. Change ScanManager so that it runs two processes in series: the
> collectionscanner, and then the fingerprinter only on files that are
> still missing fingerprints. This, to me, seems like the best option,
> though it requires major changes to ScanManager.
>
> 3. Change SqlCollection so that it has ScanManager and
> FingerprintScanManager, running them as needed. The collection would
> need to make sure they didn't run on top of each other.
>
> So, suggestions about which one of these approaches I should follow?
I think you missed my point :-P
Anything that has to be scanned in needs to be as fast as possible. If the
fingerprints are stored in a so-called "permanent" table, maybe this
alleviates it, but it could still make initial scans, or scans on new
computers, very very very slow (maybe #3 would help alleviate
it)...especially over networks. I don't want to speak for the other
developers as to whether a very very long initial scan is okay, but my guess
is that they'd agree that it's not.
So, if it's something that can be run on user demand (say, a script), perhaps
look at embedding the calculated fingerprint data into an appropriate field
in the file's metadata. Then it can be done once on the file, without having
to be redone anytime the database gets hosed, and scanning it in would be
very quick -- i.e. the embedded AFT approach. The drawback for this speedup,
of course, is that it requires users to be okay with you modifying their
files. (If this approach is something you want to pursue, we could even look
at putting both functions in a single program.)
--Jeff
More information about the Amarok
mailing list