[digiKam-users] Bug? Not all duplicates found.

Sat Jan 22 14:44:17 GMT 2022

Well, the result you see in the GUI certainly won't differ. The entries you 
see in the ImageSimilarity table are not the complete end result.
Why are there different entries in this table? We use a QSet to store the 
image IDs. QSet is unordered and uses a hash to store the information. 
Therefore, the order of the image IDs in the QSet is never the same across 
program starts. This results in the fact that the same images are never always 
compared with one another. This can result in additional entries in the table, 
but they have the same search range. The end result is always the same.

Maik

Am Samstag, 22. Januar 2022, 12:15:06 CET schrieb digikam-
users.johnny1000 at spamgourmet.com:
> Greetings,
> 
> sorry if you've already seen this, but I didn't get a copy from the
> list, so I don't know if it was actually emailed to anyone.
> 
> Original message starts here:
> 
> I can't readily find anything about this in the bug tracker.
> 
> Is anyone else seeing what is described below?
> 
> == Test setup
> 
> First I use the maintainance tools to make sure that all images are
> known to DigiKam, that all images have had their fingerprints generated,
> and that the databases have been cleaned.
> I set the similarity range to 50%-100% (that is a useful range in my use
> case).
> I make sure no images are added, deleted or moved between tests, so each
> test has _exactly_ the same starting point.
> As far as I know all parameters stay the same all the time.
> I close DigiKam, and with the application sqlitebrowser, I manually
> delete the duplicates registered in similarity.db by deleting all rows
> in the table ImageSimilarity, so DigiKam has to start from scratch when
> finding duplicates.
> 
> == The test
> 
> 1. Open DigiKam.
> 2. Go to Tools -> Maintainance and run _only_ Find duplicate items.
> 3. Close DigiKam
> 4. Open similarity.db and check the number of rows in the table
> ImageSimilarity
> 5. Close similarity.db
> 
> Repeat 1-5 and note the number of rows in the database.
> 
> == Test results
> 
> In my case the number keeps growing with each repeat, as if DigiKam
> doesn't find _all_ duplicates the first time around.
> Eventually it _seems_ as though the number of rows stabilizes. That is,
> no additional duplicates are found at extra repeats.
> I stopped the individual test when the number hadn't changed for 3 repeats.
> I have to repeat several times before the number stabilizes.
> 
> I can reproduce this behaviour at will.
> I just delete all rows from the ImageSimilarity table to reset to 0
> known duplicates.
> 
> Important to note is:
> 
> 1.
> I see this behaviour _only_ when restarting DigiKam between each repeat.
> If I leave DigiKam open between repeats, the number of rows do not change.
> 
> 2.
> Each time I reset the database, the starting number of rows is different
> from the starting number at the previous test.
> The stabilizing number is also different from test to test.
> 
> 3.
> I have consistently rejected to download the large binary files needed
> for face recognition and red eye removal.
> I have no use for those functions, and want that 1/3 of a gigabyte for
> stuff I actually _do_ use.
> I don't see any obvious connection between those two functions and the
> find duplicates function, but of course I could be wrong.
> 
> == Conclusion
> 
> Is this a bug, or is this expected behaviour?
> 
> Any insights into this would be much appreciated.
> 
> And thank you to all developers for making DigiKam for us! :o)
> 
> Best regards
> Johnny :o)