[Digikam-devel] Re: file hash creation: asking for short test

Julien Narboux Julien at narboux.fr
Thu Dec 9 12:16:08 GMT 2010


Here are my results:

Directory scanning and hash generation took 29.5804 ms/file

Success: All 39557 files have a different hash.


Le 09/12/2010 12:04, Marcel Wiesweg a ├ęcrit :
> Hi,
> we are using an MD5 hash over parts of a file to uniquely identify images and
> display thumbnails. This has worked quite well, but recently I have seen two
> or three cases where the hash fails (same hash for completely different
> images).
> There is another problem with the current hash, it relies on a binary blob of
> the metadata produced by Exiv2, but this format is not guaranteed to be stable
> (possibly, the hash changes with a new Exiv2 version).
> The recommendation by Andreas Huggel was to simply use the first 100kB of a
> file, which will typically include the file header, the metadata, and reach
> actual image data.
> A variant would be to include the last 100kB as well.
> Attached is a small application which scans a given collection directory,
> creates the hash, and will output if the hash is successful in differentiating
> all files.
> I have run this on my collection, but I would ask you to repeat testing with
> your collections to find out if it works for you as well:
> qmake testhash.pro
> make
> ./testhash /toplevel/directory/to/your/collection
> Here it takes 15s per 1000 files.
> At the end, it will tell you if any files failed, or if it succeeded. If it
> fails, it would be interesting to find out if the files are actually very
> similar, and if they have the same file size. (a hard failure would be two
> dissimilar files with the same file size)
> Thanks
> Marcel
> _______________________________________________
> Digikam-devel mailing list
> Digikam-devel at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-devel

More information about the Digikam-devel mailing list