[Digikam-devel] Re: file hash creation: asking for short test

Gilles Caulier caulier.gilles at gmail.com
Thu Dec 9 11:41:12 GMT 2010


See my trace from my office computer :

[gilles at localhost Download]$ ./testhash /mnt/data/Rep1
Scanned "/mnt/data/Rep1/0.9.2-splashcreens/Juergen Flosbach" , 9 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/0.9.2-splashcreens" , 0 files and 1 subdirectories
Scanned "/mnt/data/Rep1/221460" , 1 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Alpha 450 samples" , 21 files and 0 subdirectories
Scanned "/mnt/data/Rep1/CanonVsdigiKam" , 9 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Corrupted" , 3 files and 0 subdirectories
Scanned "/mnt/data/Rep1/dimgscale" , 1 files and 0 subdirectories
Scanned "/mnt/data/Rep1/HDR/aligned" , 13 files and 0 subdirectories
Scanned "/mnt/data/Rep1/HDR/Set1" , 3 files and 0 subdirectories
Scanned "/mnt/data/Rep1/HDR" , 27 files and 2 subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie01" , 4 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie02" , 5 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie03" , 13 files and
0 subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie04" , 5 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie05" , 3 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie06" , 2 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie07" , 3 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd/Serie08" , 5 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Arnd" , 0 files and 8 subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Daniel" , 6 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Julien" , 6 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Light Table/From Seb" , 4 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Light Table" , 1 files and 4 subdirectories
Scanned "/mnt/data/Rep1/NEW" , 14 files and 0 subdirectories
Scanned "/mnt/data/Rep1/NEW2" , 60 files and 0 subdirectories
Scanned "/mnt/data/Rep1/PhotoShop 7.0" , 6 files and 0 subdirectories
Scanned "/mnt/data/Rep1/pipo" , 1 files and 0 subdirectories
Scanned "/mnt/data/Rep1/processed" , 0 files and 0 subdirectories
Scanned "/mnt/data/Rep1/SAMPLES" , 12 files and 0 subdirectories
Scanned "/mnt/data/Rep1/SAMPLES2" , 55 files and 0 subdirectories
Scanned "/mnt/data/Rep1/SONY" , 15 files and 0 subdirectories
Scanned "/mnt/data/Rep1/splash" , 9 files and 0 subdirectories
Scanned "/mnt/data/Rep1/test/pipo" , 2 files and 0 subdirectories
Scanned "/mnt/data/Rep1/test" , 23 files and 1 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/DNG(RAWconverter)" , 41 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/GPS" , 4 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/HOTPIXELSTOOL" , 6 files and
0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/JP2" , 9 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/JPEG/Horizontal" , 49 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/JPEG/Vertical" , 30 files and
0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/JPEG" , 8 files and 2 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/LENSFUN" , 22 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/Adobe" , 11 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/B&W" , 10 files and
0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/digiKam" , 3 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/LR" , 1 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/Picasa" , 3 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata/Vista" , 20 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Metadata" , 15 files and 6
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/newpictures" , 4 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/NOISE" , 13 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/OGG" , 3 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/PCD" , 5 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Photoshop" , 28 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/PNG" , 14 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/PPM" , 4 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/RAW/Horizontal" , 61 files
and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/RAW/Mix" , 39 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/RAW/Vertical" , 16 files and
0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/RAW" , 0 files and 3 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Red Eyes" , 10 files and 0
subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/TIFF" , 27 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/Video" , 61 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/WDP" , 1 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs/XCF" , 3 files and 0 subdirectories
Scanned "/mnt/data/Rep1/Test Photographs" , 3 files and 20 subdirectories
Scanned "/mnt/data/Rep1/test.cameragui" , 9 files and 0 subdirectories
Scanned "/mnt/data/Rep1" , 0 files and 20 subdirectories
Directory scanning and hash generation took 3.66133 ms/file
Success: All 874 files have a different hash.
[gilles at localhost Download]$ ./testhash /mnt/data/Rep2
Scanned "/mnt/data/Rep2/test/processed" , 5 files and 0 subdirectories
Scanned "/mnt/data/Rep2/test" , 13 files and 1 subdirectories
Scanned "/mnt/data/Rep2" , 0 files and 1 subdirectories
Directory scanning and hash generation took 16.4444 ms/file
Success: All 18 files have a different hash.
[gilles at localhost Download]$ ./testhash /mnt/data/Rep3
Scanned "/mnt/data/Rep3/Alina Dinu <alina at londonvisa.co.uk>" , 1 files
and 0 subdirectories
Scanned "/mnt/data/Rep3/Arturo Mann <arturo.mann at gmail.com>" , 4 files
and 0 subdirectories
Scanned "/mnt/data/Rep3/Aykut Turhan <turhana at gmail.com>" , 1 files
and 0 subdirectories
Scanned "/mnt/data/Rep3/Benoit Courty <benoit.courty at gmail.com>" , 2
files and 0 subdirectories
Scanned "/mnt/data/Rep3/cabaflo <cabaflo at wanadoo.fr>" , 2 files and 0
subdirectories
Scanned "/mnt/data/Rep3/Christophe Keckeis <cke at dvdream.ch>" , 1 files
and 0 subdirectories
Scanned "/mnt/data/Rep3/D Vanraes" , 1 files and 0 subdirectories
Scanned "/mnt/data/Rep3/Eric Bayard <bayard_e at yahoo.fr>" , 3 files and
0 subdirectories
Scanned "/mnt/data/Rep3/Fr�d�ric Martinot <fmartinot at gmail.com>" , 4
files and 0 subdirectories
Scanned "/mnt/data/Rep3/Gerhard Kulzer" , 10 files and 0 subdirectories
Scanned "/mnt/data/Rep3/Gustavo Pichorim Boiko
<gustavo.boiko at gmail.com>" , 4 files and 0 subdirectories
Scanned "/mnt/data/Rep3/Josh & Erica Nijenhuis
<ejnijenhuis at gmail.com>" , 3 files and 0 subdirectories
Scanned "/mnt/data/Rep3/J�rgen Flosbach dk
<juergen.flosbach at bigfoot.com>" , 7 files and 0 subdirectories
Scanned "/mnt/data/Rep3/Markus Volkmer <markus at thunderblade.info>" , 1
files and 0 subdirectories
Scanned "/mnt/data/Rep3/Mathias Ball <meb at leitstern.de>" , 3 files and
0 subdirectories
Scanned "/mnt/data/Rep3/Michel Pottier <Michel.Pottier at free.fr>" , 6
files and 0 subdirectories
Scanned "/mnt/data/Rep3/Paul Radford <paul at radfordnz.net>" , 3 files
and 0 subdirectories
Scanned "/mnt/data/Rep3/Pol <d.paolino at gmail.com>" , 1 files and 0
subdirectories
Scanned "/mnt/data/Rep3/Roger Larsson <roger.larsson at norran.net>" , 1
files and 0 subdirectories
Scanned "/mnt/data/Rep3/S�bastien Benoit <seb.ben at sympatico.ca>" , 1
files and 0 subdirectories
Scanned "/mnt/data/Rep3" , 0 files and 20 subdirectories
Directory scanning and hash generation took 8.98305 ms/file
Success: All 59 files have a different hash.
[gilles at localhost Download]$ ./testhash /mnt/data/Rep4
Scanned "/mnt/data/Rep4/new" , 10 files and 0 subdirectories
Scanned "/mnt/data/Rep4/test/processed" , 607 files and 0 subdirectories
Scanned "/mnt/data/Rep4/test" , 42 files and 1 subdirectories
Scanned "/mnt/data/Rep4" , 0 files and 2 subdirectories
Directory scanning and hash generation took 5.9393 ms/file
Success: All 659 files have a different hash.
[gilles at localhost Download]$ ./testhash /mnt/data/Rep5
Scanned "/mnt/data/Rep5/2010-07-16/jpg" , 64 files and 0 subdirectories
Scanned "/mnt/data/Rep5/2010-07-16" , 0 files and 1 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix/2010-07-27/jpg" , 9 files and 0 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix/2010-07-27" , 0 files and 1 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix/2010-07-29/arw" , 11 files and 0 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix/2010-07-29/png" , 2 files and 0 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix/2010-07-29" , 0 files and 2 subdirectories
Scanned "/mnt/data/Rep5/Sur Aix" , 0 files and 2 subdirectories
Scanned "/mnt/data/Rep5/test" , 0 files and 0 subdirectories
Scanned "/mnt/data/Rep5" , 2 files and 3 subdirectories
Directory scanning and hash generation took 15.6023 ms/file
Success: All 88 files have a different hash.
[gilles at localhost Download]$
[gilles at localhost Download]$ ./testhash /mnt/data/Camera
Scanned "/mnt/data/Camera" , 12 files and 0 subdirectories
Directory scanning and hash generation took 1.58333 ms/file
Success: All 12 files have a different hash.
[gilles at localhost Download]$

There is not a lot files here. I can process home computer if you
want, but this week end.

Gilles


2010/12/9 Marcel Wiesweg <marcel.wiesweg at gmx.de>:
> Hi,
>
> we are using an MD5 hash over parts of a file to uniquely identify images and
> display thumbnails. This has worked quite well, but recently I have seen two
> or three cases where the hash fails (same hash for completely different
> images).
> There is another problem with the current hash, it relies on a binary blob of
> the metadata produced by Exiv2, but this format is not guaranteed to be stable
> (possibly, the hash changes with a new Exiv2 version).
>
> The recommendation by Andreas Huggel was to simply use the first 100kB of a
> file, which will typically include the file header, the metadata, and reach
> actual image data.
> A variant would be to include the last 100kB as well.
>
> Attached is a small application which scans a given collection directory,
> creates the hash, and will output if the hash is successful in differentiating
> all files.
>
> I have run this on my collection, but I would ask you to repeat testing with
> your collections to find out if it works for you as well:
>
> qmake testhash.pro
> make
> ./testhash /toplevel/directory/to/your/collection
>
> Here it takes 15s per 1000 files.
> At the end, it will tell you if any files failed, or if it succeeded. If it
> fails, it would be interesting to find out if the files are actually very
> similar, and if they have the same file size. (a hard failure would be two
> dissimilar files with the same file size)
>
> Thanks
> Marcel
>
>
> _______________________________________________
> Digikam-devel mailing list
> Digikam-devel at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-devel
>
>



More information about the Digikam-devel mailing list