[Digikam-users] unicode chars break xmp sidecars?

Gilles Caulier caulier.gilles at gmail.com
Fri May 16 06:32:45 BST 2014


With digiKam 4.0.0, just released, i fixed this entry in bugzilla :

https://bugs.kde.org/show_bug.cgi?id=159220

... which is the support of UTF8 with IPTC.

Please update when you can and test. If probelm still here for you,
open a new file in KDE bugzilla.

Thanks in advance

Gilles Caulier

2014-05-16 1:19 GMT+02:00 Phil <philtuckey at free.fr>:
> Thanks for looking Gilles. This made me think I might be causing the problem
> by something I do to my images, and I found the cause.
>
> The problem is triggered by setting the IPTC record CodedCharacterSet to
> UTF8. For example, with image.jpg which contains no IPTC records, run
>
>   exiftool -tagsfromfile @ -iptc:all -codedcharacterset=utf8 image.jpg
>
> This creates two IPTC records, CodedCharacterSet (= ESC % G) and
> EnvelopeRecordVersion (= 4). After this, the unicode-tag-breaking-sidecars
> behaviour appears for image.jpg. (One can verify that the problem is not
> caused by the EnvelopeRecordVersion record.)
>
> I was lead to set IPTC:codedcharacterset=utf8 by advice in the exiftool FAQ:
>   http://www.sno.phy.queensu.ca/~phil/exiftool/faq.html#Q10
> This usage appears to be consistent with the IPTC IIM specification pointed
> to from that page:
>   http://www.iptc.org/std/IIM/4.1/specification/IIMV4.1.pdf
> (I quote the relevant part below.)
>
> So it looks like digikam should continue to write the xmp sidecars as usual,
> when this record is set to utf8. Am I missing something?
>
> I tried tagging such images in darktable, which I believe also uses exiv2,
> and it wrote the sidecars correctly, which suggests the problem is specific
> to digikam.
>
> Best Philip
>
>
> Quote from IPTC IIM specification v.4 rev.1:
> "1.90 Coded Character Set
> Optional, not repeatable, up to 32 octets, consisting of one or more control
> functions used for the announcement, invocation or designation of coded
> character sets. The control functions follow the ISO 2022 standard and may
> consist of the escape control character and one or more graphic characters.
> For more details see Appendix C, the IPTC-NAA Code Library.
> The control functions apply to character oriented DataSets in records 2-6.
> They also apply to record 8, unless the objectdata explicitly, or the File
> Format implicitly, defines character sets otherwise.
> If this DataSet contains the designation function for Unicode in UTF-8 then
> no other announcement, designation or invocation functions are permitted in
> this DataSet or in records 2-6.
> ..."
>
>
>
> On 15/05/14 22:58, Gilles Caulier wrote:
>>
>> I try to reproduce to dysfuntion here (Linux) and "Café appears fine
>> in sidecar file.
>>
>> Sound like a dysfunction from Exiv2 which is delegate to write sidecar
>> content.
>>
>> Best
>>
>> Gilles Caulier
>>
>> 2014-05-15 22:02 GMT+02:00 Phil <philtuckey at free.fr>:
>>>
>>> Does anyone else see the following behaviour? If I assign a tag
>>> containing a
>>> (non-ascii) unicode character to an image, for example "café", digikam
>>> will
>>> write the tag to the image file perfectly well, but fails to write the
>>> xmp
>>> sidecar correctly. Only the first line of the sidecar is written:
>>> <?xml version="1.0" encoding="UTF-8"?>
>>>
>>> I am on OSX 10.9.2, digikam 3.5.0 (current macports).
>>>
>>> Thanks
>>> _______________________________________________
>>> Digikam-users mailing list
>>> Digikam-users at kde.org
>>> https://mail.kde.org/mailman/listinfo/digikam-users
>>
>> _______________________________________________
>> Digikam-users mailing list
>> Digikam-users at kde.org
>> https://mail.kde.org/mailman/listinfo/digikam-users
>>
> _______________________________________________
> Digikam-users mailing list
> Digikam-users at kde.org
> https://mail.kde.org/mailman/listinfo/digikam-users



More information about the Digikam-users mailing list