[Digikam-users] unicode chars break xmp sidecars?

Phil philtuckey at free.fr
Fri May 16 00:19:40 BST 2014


Thanks for looking Gilles. This made me think I might be causing the 
problem by something I do to my images, and I found the cause.

The problem is triggered by setting the IPTC record CodedCharacterSet to 
UTF8. For example, with image.jpg which contains no IPTC records, run

   exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 image.jpg

This creates two IPTC records, CodedCharacterSet (= ESC % G) and 
EnvelopeRecordVersion (= 4). After this, the 
unicode-tag-breaking-sidecars behaviour appears for image.jpg. (One can 
verify that the problem is not caused by the EnvelopeRecordVersion record.)

I was lead to set IPTC:codedcharacterset=utf8 by advice in the exiftool FAQ:
   http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html#Q10
This usage appears to be consistent with the IPTC IIM specification 
pointed to from that page:
   http://www.iptc.org/std/IIM/4.1/specification/IIMV4.1.pdf
(I quote the relevant part below.)

So it looks like digikam should continue to write the xmp sidecars as 
usual, when this record is set to utf8. Am I missing something?

I tried tagging such images in darktable, which I believe also uses 
exiv2, and it wrote the sidecars correctly, which suggests the problem 
is specific to digikam.

Best Philip


Quote from IPTC IIM specification v.4 rev.1:
"1.90 Coded Character Set
Optional, not repeatable, up to 32 octets, consisting of one or more 
control functions used for the announcement, invocation or designation 
of coded character sets. The control functions follow the ISO 2022 
standard and may consist of the escape control character and one or more 
graphic characters. For more details see Appendix C, the IPTC-NAA Code 
Library.
The control functions apply to character oriented DataSets in records 
2-6. They also apply to record 8, unless the objectdata explicitly, or 
the File Format implicitly, defines character sets otherwise.
If this DataSet contains the designation function for Unicode in UTF-8 
then no other announcement, designation or invocation functions are 
permitted in this DataSet or in records 2-6.
..."


On 15/05/14 22:58, Gilles Caulier wrote:
> I try to reproduce to dysfuntion here (Linux) and "Café appears fine
> in sidecar file.
>
> Sound like a dysfunction from Exiv2 which is delegate to write sidecar content.
>
> Best
>
> Gilles Caulier
>
> 2014-05-15 22:02 GMT+02:00 Phil <philtuckey at free.fr>:
>> Does anyone else see the following behaviour? If I assign a tag containing a
>> (non-ascii) unicode character to an image, for example "café", digikam will
>> write the tag to the image file perfectly well, but fails to write the xmp
>> sidecar correctly. Only the first line of the sidecar is written:
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> I am on OSX 10.9.2, digikam 3.5.0 (current macports).
>>
>> Thanks
>> _______________________________________________
>> Digikam-users mailing list
>> Digikam-users at kde.org
>> https://mail.kde.org/mailman/listinfo/digikam-users
> _______________________________________________
> Digikam-users mailing list
> Digikam-users at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-users
>



More information about the Digikam-users mailing list