[Digikam-devel] How does digikam calculate the uniqueHash ?

Wed Apr 21 17:30:02 BST 2010

> Hi again,
> 
> could anyone please point out how exactly the uniqueHash is caculated
> for the different sorts of pictures (the middle part with the exif
> data), 

libexiv2 is able to deliver us a data packet which contains the Exif 
information packed as for inclusion in a JPEG file.
It's technically the easiest way to get a hash on this information.

> and what design criterions led to the decision to use
> hash(first 8kb, exif, file length) ?

1. We want a hash
2. A hash over the complete file is too slow
3. we need parts of the file as unique as possible
4. The exif info typically contains the creation date, which is pretty unique,
  and photographic parameters like aperture and shutter speed
5. The first 8kb: It's not 0, it's not the full file, it's in between. It's 
small enough to be fast. In the end, an arbitrary decision.
6. The file length is pretty unique for compressed formats, because it depends 
on compression entropy of the image data. It also contains at least the 
smallest possible amount of information on the end of the file, while we 
calculate the hash on the beginning.