[KPhotoAlbum] Finding duplicates by MD5 revisited

Lars Clausen lars at raeder.dk
Sat Sep 9 10:37:19 BST 2006


I went back to find duplicate images, and found that my earlier
one-liner didn't work anymore, so I've hacked up a slightly better one
that is less dependent on the XML format.  This here prints out all
duplicate images (by MD5), slight variations on the parameters to uniq
can give a suitable input for xargs rm (though care is needed for
spaces).

 grep md5sum index.xml  | sed 's/.*md5sum="\([^"]*\)".*file="\([^"]*\)".*/\1\t\2/;' | sort | uniq -D -w 32

-Lars





More information about the Kphotoalbum mailing list