[Digikam-users] Images.uniquehash calculation

Jean-Fran├žois Rabasse jean-francois.rabasse at wanadoo.fr
Mon Jun 17 20:14:40 BST 2013


On Monday 17 Jun 2013 17:24:45 Marcel Wiesweg wrote:

> That's the fun of a hash...Well, I dont know.
> For debugging, I would record the binary data you feed into the hash
> in Python and C++ to a file, compare that one. If it differs, you'll be
> able to locate the problem. If not, there's a difference in the hash 
> implementation, but I doubt that.

Implementations are the same as this is mandatory for using MD5 as
files signatures. (One should be able to produce a md5 hash with any
md5 software, then check it against the file with any other md5 software.)

What is different in Marcos's problem is the way the file is read.

The Digikam C++ code opens the file in binary mode :
  if (!file.open( QIODevice::Unbuffered | QIODevice::ReadOnly ))
and this is the default with Qt.
(Text mode requires an explicit extra flag, QIODevice::Text.)

Marcos's Python script opens the file in text mode, the default with
Python (and Unix open primitive).
This means that each time the image file (a binary stream) contains
bytes that relate to end of lines, CR (Apple) or CR-LF (Microsoft),
they will be translated into a Unix line feed LF.
Thus the 100 Kb buffer is modified and of course the md5 hash computation
is modified too.

Marcos, open in binary mode in your script, line #10
  f = open (fname, 'b')


More information about the Digikam-users mailing list