[digiKam-users] Bug? Not all duplicates found.

digikam-users.johnny1000 at spamgourmet.com digikam-users.johnny1000 at spamgourmet.com
Fri Jan 21 13:18:48 GMT 2022


Greetings,

I can't readily find anything about this in the bug tracker.

Is anyone else seeing what is described below?

== Test setup

First I use the maintainance tools to make sure that all images are 
known to DigiKam, that all images have had their fingerprints generated, 
and that the databases have been cleaned.
I set the similarity range to 50%-100% (that is a useful range in my use 
case).
I make sure no images are added, deleted or moved between tests, so each 
test has _exactly_ the same starting point.
As far as I know all parameters stay the same all the time.
I close DigiKam, and with the application sqlitebrowser, I manually 
delete the duplicates registered in similarity.db by deleting all rows 
in the table ImageSimilarity, so DigiKam has to start from scratch when 
finding duplicates.

== The test

1. Open DigiKam.
2. Go to Tools -> Maintainance and run _only_ Find duplicate items.
3. Close DigiKam
4. Open similarity.db and check the number of rows in the table 
ImageSimilarity
5. Close similarity.db

Repeat 1-5 and note the number of rows in the database.

== Test results

In my case the number keeps growing with each repeat, as if DigiKam 
doesn't find _all_ duplicates the first time around.
Eventually it _seems_ as though the number of rows stabilizes. That is, 
no additional duplicates are found at extra repeats.
I stopped the individual test when the number hadn't changed for 3 repeats.
I have to repeat several times before the number stabilizes.

I can reproduce this behaviour at will.
I just delete all rows from the ImageSimilarity table to reset to 0 
known duplicates.

Important to note is:

1.
I see this behaviour _only_ when restarting DigiKam between each repeat.
If I leave DigiKam open between repeats, the number of rows do not change.

2.
Each time I reset the database, the starting number of rows is different 
from the starting number at the previous test.
The stabilizing number is also different from test to test.

3.
I have consistently rejected to download the large binary files needed 
for face recognition and red eye removal.
I have no use for those functions, and want that 1/3 of a gigabyte for 
stuff I actually _do_ use.
I don't see any obvious connection between those two functions and the 
find duplicates function, but of course I could be wrong.

== Conclusion

Is this a bug, or is this expected behaviour?

Any insights into this would be much appreciated.

And thank you to all developers for making DigiKam for us! :o)

Best regards
Johnny :o)



More information about the Digikam-users mailing list