[digiKam-users] ImageDescription field

Thu Apr 19 10:27:56 BST 2018

On jeudi 19 avril 2018 10:43:41 CEST meku wrote:
> Strange exiv2 commandline appears to work, eg:
> exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG
> 
> I filed a bug for Exif.Image.ImageDescription field not updating in DK6,
> https://bugs.kde.org/show_bug.cgi?id=393283
> 
> On 19 April 2018 at 17:48, Gilles Caulier <caulier.gilles at gmail.com> wrote:
> > Hi,
> > 
> > Exif do not support any special character encoding, as UTF8
> > 
> > Using XMP is the right way.
> > 
> > Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.
> > 
> > Gilles Caulier
> > 
> > 2018-04-19 9:43 GMT+02:00 meku <digikam at meku.org>:
> >> I discovered that my UTF captions appear to be corrupted using
> >> Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.
> >> 
> >> Using exiv2 command line it appears I can write UTF caption to this
> >> field.
> >> 
> >> I tried loading up Digikam-6.0.0 and it appears to ignore the field when
> >> writing, even though the default settings in Metadata>Advanced are set to
> >> write to the field.
> >> 
> >> Is this an issue with Digikam or is this a limitation of EXIF?

I think the key here is "appears" to work. According to the standard, EXIF 
tags can only use (7-bit) ASCII characters, but that does not mean that 
programs reading and writing the tags scrupulously respect that. 

At least in standard C and C++, the easiest way is to grab the string the user 
gives as a sequence of bytes, and write that to the metadata. And just read 
the contents from the metadata as a sequence of bytes. All that without 
worrying about the encoding... (which is not all that straightforward with 
those languages). Somewhere there must be a translation to the encoding the 
user wants, but that's not the problem of the library handling the metadata.

As long as there aren't any unexpected \000 bytes in such a sequence, that may 
appear to work correctly, *as long as the same encoding is used on writing and 
on reading*. But if the encodings for reading and writing differ, you'll get 
garbled output, and *no* sure way to get the correct encoding (though you can 
find an encoding that's 'close enough').

And changing the character encoding between reading and writing can happen 
without the user realising it: a few years ago, my linux distro switched to 
utf-8 as the default. But a lot of older files are in one of the ISO encodings. 
Result: those appear garbled for any character outside the ASCII range. And it 
might get even worse between operating systems

Personally, I think it might be a good thing if Digikam 6 refuses to write 
non-ascii data to Exif tags, provided the information can get written to 
corresponding XMP tags (which afaik is always possible).

Remco