[Digikam-devel] Re: file hash creation: asking for short test

Julien Narboux Julien at narboux.fr
Thu Dec 9 12:16:08 GMT 2010


Hi,

Here are my results:

Directory scanning and hash generation took 29.5804 ms/file

Success: All 39557 files have a different hash.

Julien

Le 09/12/2010 12:04, Marcel Wiesweg a écrit :
> Hi,
>
> we are using an MD5 hash over parts of a file to uniquely identify images and
> display thumbnails. This has worked quite well, but recently I have seen two
> or three cases where the hash fails (same hash for completely different
> images).
> There is another problem with the current hash, it relies on a binary blob of
> the metadata produced by Exiv2, but this format is not guaranteed to be stable
> (possibly, the hash changes with a new Exiv2 version).
>
> The recommendation by Andreas Huggel was to simply use the first 100kB of a
> file, which will typically include the file header, the metadata, and reach
> actual image data.
> A variant would be to include the last 100kB as well.
>
> Attached is a small application which scans a given collection directory,
> creates the hash, and will output if the hash is successful in differentiating
> all files.
>
> I have run this on my collection, but I would ask you to repeat testing with
> your collections to find out if it works for you as well:
>
> qmake testhash.pro
> make
> ./testhash /toplevel/directory/to/your/collection
>
> Here it takes 15s per 1000 files.
> At the end, it will tell you if any files failed, or if it succeeded. If it
> fails, it would be interesting to find out if the files are actually very
> similar, and if they have the same file size. (a hard failure would be two
> dissimilar files with the same file size)
>
> Thanks
> Marcel
>
>    
>
>
> _______________________________________________
> Digikam-devel mailing list
> Digikam-devel at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-devel
>    



More information about the Digikam-devel mailing list