[Digikam-users] Jpeg Comments and encodings
Matthias Keller
linux at matthias-keller.ch
Sun Jan 22 17:23:17 GMT 2012
Hi all
I've been using digikam for a long time but one thing I always stumble
upon again and again is interoperability concerning the various forms of
Jpeg Comments.
I usually view my files in Digikam and Gwenview as well as Photoshop and
Faststone ImageViewer on Windows.
So far I haven't found an acceptable way to tag my images so that it
displays correctly most of the time.
I found this old thread explaining some charsets of the various fields:
http://mail.kde.org/pipermail/digikam-users/2006-October/002116.html
It says:
- JFIF is converted from latin1
- EXIF UserComment may provide a charset, else some 'autodetection'
takes place
- IPTC is converted from latin1
- XMP wasn't supported then...
With some testing I found that digiKam reads the tags in the following
order:
- Xmp.dc.description
- Xmp.exif.UserComment
- Xmp.tiff.ImageDescription
- JFIF Comment ("Jpeg comment")
- Exif.Photo.UserComment
- Iptc.Application2.Caption (envelope encoding not honored)
All the Xmp.*.* tags seem to be read and written as UTF8 which is
correct as far as I know
However, the JFIF-Comment is written as UTF8 which is at least
questionable, as the standard doesn't define any charset at all as far
as I know (and it also seem to have changed since the above discussion
in 2006).
a) Now when we come to EXIF, things get hairy:
I've prepared a jpeg file with exiv2 and inserted an
Exif.Photo.UserComment using Unicode: (reading with exiv2 -pv image.jpg)
- I've added the complete tag name in the comment to recognize where it
comes from later on)
0x9286 Photo UserComment Undefined 88
charset="Unicode" Commentwithäöü. (Exif.Photo.UserComment)
Now when viewing in digiKam, the Xmp.dc.description tag is used in the
GUI since it's present as well. If I change the text and save again, the
comment shows up as:
0x9286 Photo UserComment Undefined 23
charset="Ascii" Commentwith???.
Thus the text was converted to ISO-8859-1 and the charset specified as
Ascii - isn't that wrong, since it's definitely not ASCII but
ISO-8859-1? Why doesn't digiKam use charset="Unicode"?
b) Iptc.Application2.Caption:
According to that Mail from 2006, IPTC Data is always encoded/decoded as
latin1, though in other places I found that one should/can specify the
Iptc.Envelope.CharacterSet to specify the character set used. This
appears to be ignored by digiKam...
c) Question about Xmp "lang"
One thing I still do not understand is the lang="..." attribute in Xmp
comments - what exactly is its meaning? Is it just to add multiple
entries using different languages? Does this affect encoding at all or
is it really always UTF8 ?
Thank you very much
Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/digikam-users/attachments/20120122/1ed4c337/attachment.html>
More information about the Digikam-users
mailing list