[Digikam-devel] [Bug 125736] New: Uniquely identifying each image in a collection of images

Duncan Hill kdebugs at nacnud.force9.co.uk
Mon Apr 17 14:14:59 BST 2006


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
         
http://bugs.kde.org/show_bug.cgi?id=125736         
           Summary: Uniquely identifying each image in a collection of
                    images
           Product: digikam
           Version: unspecified
          Platform: unspecified
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: wishlist
          Priority: NOR
         Component: general
        AssignedTo: digikam-devel kde org
        ReportedBy: kdebugs nacnud force9 co uk


Version:           0.9 SVN (using KDE KDE 3.5.2)
Compiler:          gcc 4.0.2 prerelease 
OS:                Linux

With all versions of digiKam up to .9 SVN, there is no way to determine if a photo is unique in a collection of albums, other than by name (and perhaps size in combination with name).

I propose that digiKam store a checksum of each image in the DB table for images.  This checksum can be generated in a 'lazy' manner (background), or 'non-lazy' manner (foreground, holding focus).

Once the checksum has been generated, several things become possible.

1) Parent-child relationships (sort of like what #103350 discusses), and lineage of a photo.
2) Trivial duplicate finding.  Whether the user wants to -remove- duplicates is another issue.
3) Externally moved images don't lose meta-data in the DB.
4) Album/collection split and merge.

To clarify those points a bit:
1) Once each file is identified with a unique digest, it should be possible to link (automatically and manually) the derived (edited) images to the original image.  There is an edge case of an edited photo being a collage of multiple parents, and I'm sure this can be handled fairly easily with the right table design.

To store the parent-child relationships, a simple two-column table is needed.  Parent on the left, child on the right.
[  P   |  C   ]
---------------
[ 1234 | 2345 ]
[ 1234 | 9876 ]
[ 9876 | 1010 ]
[ 1010 | 1011 ]

1234 is the parent of 2345 and 9876.  9876 is the parent of 1010, and 1010 is the parent of 1011.  This makes 1234 the great grandparent of 1011, and this can be displayed in any manner of ways, including a radial display, or a tree display.

2) Duplicate finding, as it stands, appears to be based on name.  This doesn't work well when you reset the internal numbering system of a digital camera.  With digest checksums in place, the search is a simple select where the count of unique hashes is > 1.  The user can then see each duplicate photo, and hopefully the album that the photo is in (essentially, make it part of the search interface as a pre-built search).

3) Right now, if an image is moved externally to digiKam (but within the digiKam albums tree), all meta-data is lost, and the user is probably a tad frustrated.  With checksums for every image, it becomes a case of 'Does this checksum already exist?'  If it does, present the options of:
* Move image back to original location
* Keep image in new/current location
Meta-data is intact, and the user is happy.

4) Recently, on the users list, the issue of backing up and porting collections between two computers was discussed.  I think that the unique checksum concept can help here, but I'm not quite sure how yet.  It should at least help with finding duplicate imported items, as in point 2.



More information about the Digikam-devel mailing list