[Digikam-devel] Re: file hash creation: asking for short test

Martin Klapetek martin.klapetek at gmail.com
Thu Dec 9 11:29:15 GMT 2010


Hi Marcel,

here are my results:

Directory scanning and hash generation took 35.9236 ms/file
Success: All 4557 files have a different hash.

Also, I might understand it wrong, but wouldn't reading the beginning be
better than reading the end of file in regards to IO operations? (as with
reading the end of the file you must move the file "cursor" to somewhere
near the end, with the beginning you just open and read)

Marty

On Thu, Dec 9, 2010 at 12:04, Marcel Wiesweg <marcel.wiesweg at gmx.de> wrote:

> Hi,
>
> we are using an MD5 hash over parts of a file to uniquely identify images
> and
> display thumbnails. This has worked quite well, but recently I have seen
> two
> or three cases where the hash fails (same hash for completely different
> images).
> There is another problem with the current hash, it relies on a binary blob
> of
> the metadata produced by Exiv2, but this format is not guaranteed to be
> stable
> (possibly, the hash changes with a new Exiv2 version).
>
> The recommendation by Andreas Huggel was to simply use the first 100kB of a
> file, which will typically include the file header, the metadata, and reach
> actual image data.
> A variant would be to include the last 100kB as well.
>
> Attached is a small application which scans a given collection directory,
> creates the hash, and will output if the hash is successful in
> differentiating
> all files.
>
> I have run this on my collection, but I would ask you to repeat testing
> with
> your collections to find out if it works for you as well:
>
> qmake testhash.pro
> make
> ./testhash /toplevel/directory/to/your/collection
>
> Here it takes 15s per 1000 files.
> At the end, it will tell you if any files failed, or if it succeeded. If it
> fails, it would be interesting to find out if the files are actually very
> similar, and if they have the same file size. (a hard failure would be two
> dissimilar files with the same file size)
>
> Thanks
> Marcel
>
>
> _______________________________________________
> Digikam-devel mailing list
> Digikam-devel at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/digikam-devel/attachments/20101209/8a3b0915/attachment.html>


More information about the Digikam-devel mailing list