[Digikam-users] Jpeg Comments and encodings

Sun Jan 22 17:23:17 GMT 2012

Hi all

I've been using digikam for a long time but one thing I always stumble 
upon again and again is interoperability concerning the various forms of 
Jpeg Comments.
I usually view my files in Digikam and Gwenview as well as Photoshop and 
Faststone ImageViewer on Windows.
So far I haven't found an acceptable way to tag my images so that it 
displays correctly most of the time.

I found this old thread explaining some charsets of the various fields:
http://mail.kde.org/pipermail/digikam-users/2006-October/002116.html
It says:
- JFIF is converted from latin1
- EXIF UserComment may provide a charset, else some 'autodetection' 
takes place
- IPTC is converted from latin1
- XMP wasn't supported then...

With some testing I found that digiKam reads the tags in the following 
order:
- Xmp.dc.description
- Xmp.exif.UserComment
- Xmp.tiff.ImageDescription
- JFIF Comment ("Jpeg comment")
- Exif.Photo.UserComment
- Iptc.Application2.Caption (envelope encoding not honored)

All the Xmp.*.* tags seem to be read and written as UTF8 which is 
correct as far as I know
However, the JFIF-Comment is written as UTF8 which is at least 
questionable, as the standard doesn't define any charset at all as far 
as I know (and it also seem to have changed since the above discussion 
in 2006).

a) Now when we come to EXIF, things get hairy:
I've prepared a jpeg file with exiv2 and inserted an 
Exif.Photo.UserComment using Unicode: (reading with exiv2 -pv image.jpg) 
- I've added the complete tag name in the comment to recognize where it 
comes from later on)
0x9286 Photo        UserComment                 Undefined  88  
charset="Unicode" Commentwithäöü. (Exif.Photo.UserComment)

Now when viewing in digiKam, the Xmp.dc.description tag is used in the 
GUI since it's present as well. If I change the text and save again, the 
comment shows up as:
0x9286 Photo        UserComment                 Undefined  23  
charset="Ascii" Commentwith???.

Thus the text was converted to ISO-8859-1 and the charset specified as 
Ascii - isn't that wrong, since it's definitely not ASCII but 
ISO-8859-1? Why doesn't digiKam use charset="Unicode"?

b) Iptc.Application2.Caption:
According to that Mail from 2006, IPTC Data is always encoded/decoded as 
latin1, though in other places I found that one should/can specify the 
Iptc.Envelope.CharacterSet to specify the character set used. This 
appears to be ignored by digiKam...

c) Question about Xmp "lang"
One thing I still do not understand is the lang="..." attribute in Xmp 
comments - what exactly is its meaning? Is it just to add multiple 
entries using different languages? Does this affect encoding at all or 
is it really always UTF8 ?

Thank you very much

Matt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/digikam-users/attachments/20120122/1ed4c337/attachment.html>