[digikam] [Bug 375573] Don't reset/destroy context after deleting one image among a set of duplicates

Fri Jan 27 07:03:39 GMT 2017

https://bugs.kde.org/show_bug.cgi?id=375573

--- Comment #3 from Mario Frank <mario.frank at uni-potsdam.de> ---
Hey Dan,

I will answer inline since there are some things that came me in mind.

(In reply to Dan Dascalescu from comment #2)
> Hey Mario,
> 
> Thank you for the explanation. I understand the tradeoff - accuracy in
> reporting the number of dupes, vs. speedy processing. The solution I propose
> revolved around lazy calculation - does the user care more about a precise
> number shown next to the album *when they get to see it*, or to be able to
> move on to examine the other duplicates in the cluster?

I would expect the latter to be more important than the accuracy. Thus,
delaying is an option for me.

> 
> I mentioned "when they get to see it" because after the user deletes one of
> the duplicates, the list of duplicate clusters in the left pane always
> scrolls to the top (IMO this could be improved to try to keep the scroll
> position, but digiKam probably just re-sorts the list), so if they were
> working on a duplicate cluster below the fold (i.e. if they have scrolled
> down at all), the number of duplicates in that album won't be visible
> anyway. In fact, when you deal with many clusters of duplicates, only those
> items at the top, according to the sort order (Ref. images filename, # of
> items, or Avg. similarity) will be visible.

Okay, let's switch to your terminus. With duplicates albums, we refer to
what you call duplicates clusters (internally called search albums), i.e.
the entries in the left table - one duplicates album is one entry here.
Scrolling to the top is really annoying. This could be resolved. 
But I will come to that later.

> 
> Not sure what you meant by "one duplicates album" (needs to be adjusted) -
> did you mean a cluster (in DUFF terminology, http://duff.dreda.org/) of
> duplicates (which may be spread across different albums), or an album that
> contains duplicates, so the count of items in the album needs to be
> adjusted? In the latter case, that count is even farther from the user's
> attention, because the user is in the Fuzzy tab, vs. in the Albums tab.
> Could the recalculation of counts be done only once, when the user leaves
> the Fuzzy tab?
> 
> Also, there are two different scenarios I see when it comes to deleting
> duplicates:
> 
> 1) Deleting images in duplicate clusters one by one, while the user looks at
> the picture in Preview Mode, to examine it in as large of a size as
> possible. In this case, only one image is deleted at a time. Would counts be
> easier to decrement in this case?

Yes, this was my first approach when I tried to fix the referenced bug.
But the fact that the image should also vanish from other duplicates clusters
would have forced me to decrement there, too. But the count of images is
defined
in the internal search albums in the way that the count is the count of image
ids.
And the cluster list does not know how many of the images are existent.
Nevertheless, it is technically possible to get the cluster list to know which
images
still exist and which do not. But then again, the average similarity is not
correct
anymore as it is calculated on the complete set of images.
This could be also solved by the fact that I introduced the similarities
between images
in database shortly before release of 5.4.

> 
> 2) Staying in Thumbnails or Table, selecting multiple images, and deleting
> them at once.
> 
> Finally, question about "the deleted image may be member of other duplicates
> albums" (this relates to the cluster vs. album distinction) - is the
> duplicate relationship transitive? I mean, if images A and B are dupes
> within the similarity range, and B is part of another cluster of duplicates,
> A should be part of that cluster too, which means only two counts need to be
> updates: the number of dupes in that cluster, and the number of items in the
> album the image belongs to.

Theoretically, you are right. If image A is a duplicate of reference images
B and C, the images B and C have *some* similarity, too. But as in audio
streams -
if stream a is part of stream b and c, the latter streams have *some*
similarity
in *some* position. Perhaps the similar parts are only 2 %. Depending on the
given
similarity range, this similarity is ignored. We cannot use transitive closures
here.

So, to roll up.
If we have duplicates cluster A and we delete some image that is also part of
duplicates
cluster B, we need to update both clusters - in some way:
rescanning/decrementing counts.
If we delete the reference image of cluster A itself, the cluster would
currently vanish.
As consequence, the internal search album is removed and you lose context. This
is a problem
which was not addressed in the referenced bug. And this is a real disturbance
in the workflow.

I would thus propose the following: the removal of an image in some duplicates
album should
signal the list of duplicates clusters to update. The count of images in
clusters is recalculated
by getting the information which images still exist. At the same time, the new
average similarity
is calculated with the similarities of the remaining images to the reference
image.
All duplicates clusters which only contain one image are removed from the list
as they are not relevant
anymore. This all should be technically quite easy to implement until the
release of 5.5.

What do the other devs think?

If this is confirmed, I would do that after I am finished with my small garbage
collection project.

-- 
You are receiving this mail because:
You are the assignee for the bug.