[Digikam-devel] [digikam] [Bug 337688] Reading/writing of keyword-tags to jpg and xmp corrupts tag hierarchy, duplicate root tag

Christian buitk14 at A1.net
Thu Jul 24 05:04:05 BST 2014


Christian <buitk14 at A1.net> changed:

           What    |Removed                     |Added
           Severity|major                       |grave

--- Comment #15 from Christian <buitk14 at A1.net> ---
Testcase explains why tag hierarchy is getting corrupt quickly:

I suggest to set the bug to "grave" after building a test case to reproduce the
corruption of a small tag hierarchy - see below. Three sources of corruption
eat up your tags and limit the usage of digikam 4.1 to a single device with
single database that should never break. Do not leave this path until this
family of bugs is fixed.

On top of my wishlist:
Please help me with my inconsistent tags in thousands of images tagged with
different releases of digikam caused by these bugs and older ones. A simple
tool could help me and many others out of this nightmare.

Requirements: This tool should read all tags that make sense from each file and
copy them to all sections (XMP/IPTC ...) in a consistent way without any root
tags and without any duplications or imports to the database. Remove all
unreadable stuff. The database should be rebuild from scratch after the
consolidation is complete.

Testcase Explanation:

I used a "clean" install, a new empty mysql database and OpenSuse 13.1 with
digikam 4.1.0-11.8 and KDE 4.13.3 to test keyword tagging. I am convinced that
SQLLite shows similar results, but I had no time to check this.

1. Copy test images to the folder with your collection

Download the zip file and unpack it to a local folder:
Copy the sample files into your image folder, but do not copy the screenshots.

2. Subfolder: 0_inconsistent_writing_reading/album1_with_single_root_tags

bug i:
There is a bug in "Read metadata" that duplicates hierarchies when IPTC and XMP
Section contain different keywords, or when "full path keywords" are mixed with
single keywords. Select all images in album1 and "Read metadata from images"

bug ii:
There is another bug in "Read metadata" - whenever tags are found, that are not
in the hierarchy, a "_Digikam_root_tag_" is created in the GUI, that was not in
the images taglist. This is also done when there is already such a tag - in
same cases they are shown beside on the same level, most of the time nested
under each other. 

To check this remove all tags from the database using "Tag Manager" and "Read
metadata from image" of any tagged image. If you try to delete this tag you
loose all others too. The same happens if you add a tagged image from another
PC with keywords that are not already in your tag tree. 

Note: The automatic update might behave in another way - but the "Read
metadata" function will always create such unwanted tags. 

bug iii:
Beware - duplicated tag-branches in digikam 4.1 GUI are not always different
tags until you use "write metadata". If you close digikam and open it again,
some of the duplicate tags will disapear, others remain. This is another
anoying bug: GUI view and internal model are out of sync. Many times this
causes a loss of all tags - e.g. if you delete a nested duplicate tag, that is
internally identical with the root tag above - so you delete the topmost root
tag and with it all your tags are gone!

Even worse - you will not see the loss until you close and open digikam again -
so any tagging operation with write operations writes chaos tags or nothing to
your files.

See "digikam_4.1_remove_one_duplicate_tag_branch_before_close.jpg"
    for demonstration

3. Subfolder: 0_inconsistent_writing_reading/album2_with_duplicate_root_tags


... demonstrates the mentioned bugs - I found this one in my collection that
was edited on two different PCs. Some of the tags have duplicate root tags on
top because the tag tree was imported on another PC from the image and later on
more tags where added ... this caused the corrupt tags in the GUI to be written
to the file.

... demonstrates what happens when this metadata is read and written again -
the root tag was not duplicated this time, but the ones without root tag have
one now. Please not that this is undesired behaviour - why are toplevel tags
added to some tags and not to others? Unpredictable behavior ..

4. Subfolder: 0_inconsistent_.../album3_with_no_dk_root_tag_and_duplication

album3 demonstrates the bugs described before on a single file, starting with
an empty tag tree. First two nested tags are added to the file. Then two more
tags are added on the third level. Finally two of the topmost tags are removed
again. The removal of these tags now works without any explicit writing of
metadata ... this is an advantage compared to older versions : )

Everything worked out well - also writing of metadata and reading metadata
again does not cause corruption. The bugs i-iii described above do not apply,
because the hierarchy was created on this PC and is still present!

Corruption starts once you remove some of these tags before reading metadata
again, or if you copy the tagged files to another PC and "read metadata".
This explains why these severe bugs have not been fixed for such along time.

See: "_DSC2638_5_unwanted_digikam_root_tag_written_to_file_from_gui.JPG"

See also this example to understand creation of nested duplicate root tags: 

4. Subfolder: 1a_move_tag_to_new_position_in_tree_by_moving

This example demonstrates inconsistent IPTC and XMP tags that cause a bad mess
when reading metadata from such a file, because no tag tree will match these
cases - so many duplicate branches are created (I hope this is not excpected?)

It shows what happens if one tries to move a tag from top down to a subbranch
and writes metadata to all related files again (not needed if tags stored as
single keywords- but who knows in which way tags are stored in a particular
image?) In this case writing metadata works well.

But rereading metadata from this file really surprised me - a Person tag shows
up, why now I cant tell - and even worse: the "Zeit" tag was removed to top
(why causes reading a move of a tag?) - and some hidden, old "Zeit" Versions
appeared, that where not visible before, when we read metadata.

bug iv:  In case of inconsistent IPTC keywords that do not match XMP keywords,
reading metadata will not show all kewords. After some other operations reading
again will bring new keywords (hidden in the file). In this case the position
of existing keywords is changed as well while reading - this is unwanted.
This might be an issue of "full path keywords" mixed with "single keywords".

4. Subfolder: 1b_reread_with_corrected_hierarchy_duplicates

digikam_4.1_tag_tree_with_missing_br_duplicates_root_when_reading.jpg and

demonstrate the mentioned bug, that the GUI is sometimes not in sync with the
internal model - branches that seem duplicates are internally not duplicated -
so deleting a nested tag leads to the loss of the whole branch.
This can be avoided, if you close an open digikam each time a duplication
occurs - to see if it is real or just fake.

5. Subfolder: 

 Several of methods to write tags have been applied to these images, and the
"Read metadata from images" functions causes a big mess of duplicate tags.
 There is also an example of a tagged file that contains no root tag - but if
it resides in an album with an image with an root tag, it will get one the next
time the metadata of this album is writen to all files- 

bug v: unwanted root tags infect other images with root tags if they are
changed together.

6. Subfolder: 3_write_from_duplicated_hierarchy_to_file

Digikam used (hopefully this does not continue) a mixed strategy to write tags
- sometimes with full path, sometimes not. This causes strange semantics for
duplicate keywords. 

Wish / bug vi:
To avoid chaos: always write keywords with full path, and NEVER write identical
keywords or the same path more than once.
I am not sure if digikam 4.1 meets this requirement.

In this example the same keyword is used on four different positions in the tag
tree (because of other bugs some branches have been duplcated), and all four
have been selected and written to the file. If the GUI view was out of sync,
some of them stand for the same position - but they have been written four
times. I cant tell if resulting metadata is as expected.

7. Subfolder: 4_remove_inconsistent_tag_close_open

Another example that GUI view and internal model are sometimes out of sync.
After accidentially deleting the top most tag (because it was shown two times)
the tree looks well - after closing and opening the whole branch is gone.

You are receiving this mail because:
You are the assignee for the bug.

More information about the Digikam-devel mailing list