[Digikam-devel] [Bug 262452] New: duplicate uniqueHash (image hash) in database, wrong thumb on images

Elle Stone l.elle.stone at gmail.com
Fri Jan 7 20:29:34 GMT 2011


https://bugs.kde.org/show_bug.cgi?id=262452

           Summary: duplicate uniqueHash (image hash) in database, wrong
                    thumb on images
           Product: digikam
           Version: 1.7.0
          Platform: Ubuntu Packages
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: NOR
         Component: Database
        AssignedTo: digikam-devel at kde.org
        ReportedBy: l.elle.stone at gmail.com


Version:           1.7.0 (using KDE 4.4.5) 
OS:                Linux

One raw file processed multiple times by ufraw, output as tifs with different
names. These images are very different renditions visually. Also they all have
different md5sums when running md5sum at the command line. 

In the digikam database, most of the renditions have the wrong thumb. So I
created a test database with only 8 images, 2 raw files, one tiff from one of
the raw files, several tiffs (visually very different renditions from each
other) from the other raw file, and one jpeg from the raw file (probably not
produced by ufraw). In digikam4.db there are 8 entries in the Images table, 5
of which have the same uniqueHash. in thumbnails-digikam.db there are only 4
thumbs.

Right-clicking on the thumbs and selecting "edit" does open the correct image
file, as does opening the preview.

So I used ufraw to produce 3 tifs and 2 jpegs from the other raw file. The
jpegs got different uniqueHashes, the tifs all share the same uniqueHash,
giving me 13 images in the database, and only 7 uniqueHashes.

Reproducible: Always

Steps to Reproduce:
Put a raw file into a directory. Open the raw file with ufraw. produce a tif.
do it a couple more times, make the images look wildly different, so there is
no question that the images are not the same. Save each time under a different
name. Then open digikam and rescan the directory (or import a new collection if
a different root). 

Actual Results:  
Use SQLite database browser to inspect the digikam data and thumbs databases.
You'll see an entry in the images table for each tif, but they'll all share the
same uniqueHash. Initially the images may or may not have different thumbs, but
play around, the thumbs will collapse, so that all the images with the same
uniqueHash now have the same thumb. 

Expected Results:  
I'd expect each tif-rendition/version of the original raw file, saved under
different names, would have truly unique uniqueHashes, and would have their own
correct thumbs.

jpegs from ufraw don't seem to have this problem. I haven't checked other
tif-producing software (but I will). Using exiftool to inspect a couple of the
ufraw-produced tifs, it looks like ufraw 0.16 copies all the raw file data over
to the tiff, so all the metadata in the two images looks (upon quick glance) to
be identical. If uniqueHash is depending on metadata to generate uniqueHashes,
then that could be the source of the problem.

As md5 of itself is subject to hash collisions, it seems to me that in a large
image database, using only a part of the image to calculate md5 hashes is not
such a good idea, even apart from the current issue. As already stated, the
actual md5 hashes of the images, as calculated by md5sum at the command line,
are all different. (Probably a move to sha1 (over the whole image) would be
overkill. And probably I don't know enough about hashes to even make these
statements.)

-- 
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the Digikam-devel mailing list