[KimDaBa] Duplicates

Lars Clausen lars at raeder.dk
Sat Dec 3 09:00:14 GMT 2005


So I found I had a number of duplicate images in my database and tried
out the Remove Duplicates plugin.  I was surprised to find that it took
forever even on Fast mode, so I looked at the index.xml file and came up
with this bash one-liner to find duplicate files:

grep md5sum index.xml  | cut -d\" -f16- | sort | uniq -w 32
--all-repeated=separate | cut -d\" -f5

This merely compares md5sum, but quickly, and prints out all the
duplicates with a line between them.  It is also very dependant on the
XML format.  This should be easy to reproduce in a Perl plugin, the
problem (of course) is making a good interface.

-Lars

-- 
Lars Clausen (lars at raeder.dk, larsrc at gmail.com, http://lars.raeder.dk)
"I do not agree with a word that you say, but I will defend to the
 death your right to say it."
    --Evelyn Beatrice Hall paraphrasing Voltaire






More information about the Kphotoalbum mailing list