[Digikam-users] Images.uniquehash calculation
Remco Viƫtor
remco.vietor at wanadoo.fr
Mon Jun 17 19:49:26 BST 2013
On Monday 17 June 2013 19:24:45 Marcel Wiesweg wrote:
>
> > Disclaimer: probably this is not the right list to ask this. if so,
> > just let me know. also, I'm not subscribed, so please CC me in the
> > answers.
> >
> > I'm trying to write a script that is able to take an image already
> > in digikam's database and resize it, apply the same tags as the
> > original, and possibly remove the original. so far the idea is that
this
> > script will be independent of digikam, touching it's database when
> > needed. so I checked the database structure and it looks ok, except for
> > the md5sum. I tried to reimplement DImgLoader::uniqueHashV2() in
> > libs/dimg/loaders/dimgloader.cpp:329, and even reimplementing it in
> > python with the same libraries (qt4's md5) and copying the algo line by
> > line, I get different values in the database and with the script. am I
> > missing something? for omparisson, I attach the script I did.
>
> That's the fun of a hash...Well, I dont know.
> For debugging, I would record the binary data you feed into the hash in
Python
> and C++ to a file, compare that one. If it differs, you'll be able to
locate
> the problem. If not, there's a difference in the hash implementation, but
I
> doubt that.
>
> Marcel
> _______________________________________________
> Digikam-users mailing list
> Digikam-users at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-users
According to the code, the same hashing routine is used (not only the same
algorithm, but actually the same implementation).
There is one difference between the two routines though:
- in the Digikam C++ routine, the datablocks are only used if there are
actually data read
- in the python routine, this check is omitted, and the data block is added
to the data to be hashed /unconditionally/.
For the second data block (the last 100 kB), as there is a seek just
before, that could make a difference if the file is <100kB:
- in C++, the file's probably in an error state, so no data will be read, so
the second data block will not be fed to the hash routine.
- in Python, the data block /is/ fed, but will probably contain rubbish if
the file is <100kB...
Also, if the python script changes anything in the metadata (e.g. by
recording the correct image size...), the first 100kB will differ.
Remco
More information about the Digikam-users
mailing list