duplicate remover

Jeff Mitchell kde-dev at emailgoeshere.com
Sat Jun 9 14:30:16 UTC 2007


On Saturday 09 June 2007, Colin Guthrie wrote:
> Luke wrote:
> > I want to write a duplicate song remover plugin.  Not
> > just removed from the playlist but deleted from the
> > disk.  I searched the wiki for "duplicate" and didn't
> > find one so I assume it isn't already written.  I want
> > it to be smart: delete the song with the lower
> > bitrate, make softlinks when a compilation is
> > involved, make the most complete ID3 tag replacing
> > "Unknown" with what is known amongst the dups, etc...
> >
> > First step would be to write a bunch of SQL scripts to
> > identify dups.  On that front is the default amarok
> > database sufficient or should I regen my database to
> > the mysql version?  I'm very impressed how amarok can
> > tell if two different songs are in the same album
> > (even when the ID3 tags don't align 1to1) so I should
> > probably research how that works.
>
> I don't necessarily want to delete from disk but I certainly don't want
> dupes showing up in the collection/playlist.
>
> I have an NFS share will all my music on it and my Laptop mirrors part
> of that.
>
> When I connect the share, quite a lot of tracks show up twice! I thought
> that AFT (or whatever the latest acro is!) prevented this, but alas it
> seems not.
>
> Perhaps it's just my settings tho'.

AFT advice:

AFT will detect *exact* duplicates of files...not "alike" files.  But it 
wouldn't be useful for this kind of duplicate detection unless something were 
hardcoded into Amarok.

The reason is that for the file tracking functionality to work, every file 
needs to have a unique identifier, which is calculated from file properties 
(including tags).  Multiple files with the same unique id wouldn't let you 
track correctly, because if one moved somewhere you wouldn't know which one.  
So when AFT detects a duplicate, it removes one or the other entry from the 
database.  This way, items (like playlist items) that use a uniqueid for a 
URL pointer (if it suddenly finds the URL to be invalid) can always find 
*some* exact copy of the song, although it may not be the one that you were 
intending.  But for most purposes this doesn't matter...as long as the song 
plays you're happy.

Unfortunately this means that you can't simply query the AFT tables looking 
for duplicate unique ids...

--Jeff



More information about the Amarok mailing list