A question for devs

Jeff Mitchell kde-dev at emailgoeshere.com
Thu Jun 21 13:09:24 UTC 2007


On Thursday 21 June 2007, Seb Ruiz wrote:
> On 21/06/07, Vladimir Kulev <me at lightoze.net> wrote:
> > Hello Amarok devs! I would like to know - what are you thinking about
> > https://bugs.kde.org/show_bug.cgi?id=144761, and would it be covered in
> > Amarok 2 database model?
>
> We haven't spoken about making changes to accommodate these sorts of
> problems. I would anticipate that it is very easy to make a database
> change by adding a foreign key which links to the duplicate song.
>
> However, my personal opinion is that it would be a big effort to
> implement such a feature, and I don't know if it could be justified.
> The hardest thing, and most error prone would be determining if two
> songs are the same. How can this be done, I don't think that we can
> rely on simply tags since many users have very poor tag management
> features (think Track 01.mp3). Using an external library to analyse
> the files would also be out of the question.
>
> Seb

To go into a little further detail (I'm probably going to close the bug), the 
bug suggests detecting "duplicate songs" by same tags/metadata.  This is 
actually a bad idea.  Even if the tags are *exactly* the same, that doesn't 
mean the song _data_ is the same.  Maybe you have two files with the same 
tags (especially if all you have is the name of the track and the artist) but 
they are both VBR with different bit rates.  Clearly these are not duplicate, 
then.  Maybe one is live but isn't marked as such.  You could let the user 
pick which one to remove, but that gets messy.

This goes back to the use case in the bug of the person with both .flac 
and .mp3 files of the same music.  I used to do this too, and I never ran 
into any problem, because I put the .flac files and .mp3 files in separate 
but identical directory trees.  So my .flacs would be rooted 
in /mnt/music/FLAC and my .mp3s would be rooted in /mnt/music/MP3 with the 
directory trees underneath being the same.  Then you simply add one, or the 
other, to Amarok's collection (and if you want to use the .flacs and 
put .mp3s onto devices, there's always the File Browser).  If you already 
have your .flacs and .mp3s in the same directories, it's trivial to write a 
script to separate them.

I don't agree with the assertion that scores become inaccurate, as scores are 
based on file usage, not "song" usage.  There are many reasons why basing 
on "song" usage doesn't make a lot of sense, the main one being the fact that 
people don't usually have libraries with good, proper tagging.  And if scores 
relied solely on metadata, and the metadata was changed, there goes your 
score.  By scoring on file, if the metadata changes the statistics are still 
fine; if the file name changes in the majority of cases AFT detects this and 
the statistics are again fine.

As for the other use cases (2 and 3 in the bug report), I think there are 
better ways to handle this than by putting kludgy duplicate-detection in 
Amarok:

#2: Delete one or the other of the files when you find them.  This doesn't 
seem like it'd happen too often.
#3: Take the files that are duplicate, and move them into a separate place on 
your local machine that is not a part of Amarok's collection.  Then when you 
have access to the NFS share, you can get at the files there; when you don't, 
add the directory back into your collection or use the file browser/Konqueror 
to add the files to your playlist when you want.  This will also keep 
statistics sane.  In Amarok 2.0 there will be support for multiple local 
collections, which would mean that you could have these files in a 
second/third/whatever collection and simply ignore that collection when you 
have access to the local NFS mount.

--Jeff



More information about the Amarok mailing list