[digiKam-users] ImageDescription field
Remco Viëtor
remco.vietor at wanadoo.fr
Thu Apr 19 10:27:56 BST 2018
On jeudi 19 avril 2018 10:43:41 CEST meku wrote:
> Strange exiv2 commandline appears to work, eg:
> exiv2 -M"set Exif.Image.ImageDescription 'ミスタードーナツ'" FILE.JPG
>
> I filed a bug for Exif.Image.ImageDescription field not updating in DK6,
> https://bugs.kde.org/show_bug.cgi?id=393283
>
> On 19 April 2018 at 17:48, Gilles Caulier <caulier.gilles at gmail.com> wrote:
> > Hi,
> >
> > Exif do not support any special character encoding, as UTF8
> >
> > Using XMP is the right way.
> >
> > Note : IPTC is limited to Latin1 (ASCII extended). Take a care too.
> >
> > Gilles Caulier
> >
> > 2018-04-19 9:43 GMT+02:00 meku <digikam at meku.org>:
> >> I discovered that my UTF captions appear to be corrupted using
> >> Digikam-5.9.0, but only in the Exif.Image.ImageDescription field.
> >>
> >> Using exiv2 command line it appears I can write UTF caption to this
> >> field.
> >>
> >> I tried loading up Digikam-6.0.0 and it appears to ignore the field when
> >> writing, even though the default settings in Metadata>Advanced are set to
> >> write to the field.
> >>
> >> Is this an issue with Digikam or is this a limitation of EXIF?
I think the key here is "appears" to work. According to the standard, EXIF
tags can only use (7-bit) ASCII characters, but that does not mean that
programs reading and writing the tags scrupulously respect that.
At least in standard C and C++, the easiest way is to grab the string the user
gives as a sequence of bytes, and write that to the metadata. And just read
the contents from the metadata as a sequence of bytes. All that without
worrying about the encoding... (which is not all that straightforward with
those languages). Somewhere there must be a translation to the encoding the
user wants, but that's not the problem of the library handling the metadata.
As long as there aren't any unexpected \000 bytes in such a sequence, that may
appear to work correctly, *as long as the same encoding is used on writing and
on reading*. But if the encodings for reading and writing differ, you'll get
garbled output, and *no* sure way to get the correct encoding (though you can
find an encoding that's 'close enough').
And changing the character encoding between reading and writing can happen
without the user realising it: a few years ago, my linux distro switched to
utf-8 as the default. But a lot of older files are in one of the ISO encodings.
Result: those appear garbled for any character outside the ASCII range. And it
might get even worse between operating systems
Personally, I think it might be a good thing if Digikam 6 refuses to write
non-ascii data to Exif tags, provided the information can get written to
corresponding XMP tags (which afaik is always possible).
Remco
More information about the Digikam-users
mailing list