[Digikam-devel] How does digikam calculate the uniqueHash ?

Sun Apr 11 14:27:23 BST 2010

Hi,

I am trying to write some code for a scripting language to extract the
data for a given picture from the digikam database file.

(I want to keep my raw format picture files unmodified by digikam, but
sometimes I need to automatically convert some of them to jpeg files
for export outside the digikam directory and need to extract the
information needed for IPTC from the database. So I need to identify
the id of a picture in the database.)

Unfortunately the algorithm to calculate the uniqueHash appears to be
sort of weird.

What I found so far (from the undocumented source), that the
uniqueHash is an MD5 sum of the concatenation of

- the exif section of the picture
- the first 8192 bytes of the picture file
- the length of the picture file written as a decimal number

I then could correctly calculate the uniqueHash for jpeg images, but
not for raw images.

raw images are usually based on the TIFF file format. Exif data are
afaik TIFF entries. Therefore, TIFF files do (unlike JPEG) not have a
separate Exif section, but have Exif tags (which are in fact TIFF
tags) interwoven with the hole file.

How exactly is the uniqueHash calculated for these files?

regards
Hadmut