duplicate remover
Jeff Mitchell
kde-dev at emailgoeshere.com
Sat Jun 9 14:30:16 UTC 2007
On Saturday 09 June 2007, Colin Guthrie wrote:
> Luke wrote:
> > I want to write a duplicate song remover plugin. Not
> > just removed from the playlist but deleted from the
> > disk. I searched the wiki for "duplicate" and didn't
> > find one so I assume it isn't already written. I want
> > it to be smart: delete the song with the lower
> > bitrate, make softlinks when a compilation is
> > involved, make the most complete ID3 tag replacing
> > "Unknown" with what is known amongst the dups, etc...
> >
> > First step would be to write a bunch of SQL scripts to
> > identify dups. On that front is the default amarok
> > database sufficient or should I regen my database to
> > the mysql version? I'm very impressed how amarok can
> > tell if two different songs are in the same album
> > (even when the ID3 tags don't align 1to1) so I should
> > probably research how that works.
>
> I don't necessarily want to delete from disk but I certainly don't want
> dupes showing up in the collection/playlist.
>
> I have an NFS share will all my music on it and my Laptop mirrors part
> of that.
>
> When I connect the share, quite a lot of tracks show up twice! I thought
> that AFT (or whatever the latest acro is!) prevented this, but alas it
> seems not.
>
> Perhaps it's just my settings tho'.
AFT advice:
AFT will detect *exact* duplicates of files...not "alike" files. But it
wouldn't be useful for this kind of duplicate detection unless something were
hardcoded into Amarok.
The reason is that for the file tracking functionality to work, every file
needs to have a unique identifier, which is calculated from file properties
(including tags). Multiple files with the same unique id wouldn't let you
track correctly, because if one moved somewhere you wouldn't know which one.
So when AFT detects a duplicate, it removes one or the other entry from the
database. This way, items (like playlist items) that use a uniqueid for a
URL pointer (if it suddenly finds the URL to be invalid) can always find
*some* exact copy of the song, although it may not be the one that you were
intending. But for most purposes this doesn't matter...as long as the song
plays you're happy.
Unfortunately this means that you can't simply query the AFT tables looking
for duplicate unique ids...
--Jeff
More information about the Amarok
mailing list