[KimDaBa] Duplicates
Lars Clausen
lars at raeder.dk
Sat Dec 3 09:00:14 GMT 2005
So I found I had a number of duplicate images in my database and tried
out the Remove Duplicates plugin. I was surprised to find that it took
forever even on Fast mode, so I looked at the index.xml file and came up
with this bash one-liner to find duplicate files:
grep md5sum index.xml | cut -d\" -f16- | sort | uniq -w 32
--all-repeated=separate | cut -d\" -f5
This merely compares md5sum, but quickly, and prints out all the
duplicates with a line between them. It is also very dependant on the
XML format. This should be easy to reproduce in a Perl plugin, the
problem (of course) is making a good interface.
-Lars
--
Lars Clausen (lars at raeder.dk, larsrc at gmail.com, http://lars.raeder.dk)
"I do not agree with a word that you say, but I will defend to the
death your right to say it."
--Evelyn Beatrice Hall paraphrasing Voltaire
More information about the Kphotoalbum
mailing list