[Okular-devel] [okular] [Bug 344849] PDF metadata is displayed incorrectly in File -> Properties

Sun Mar 15 03:29:22 UTC 2015

https://bugs.kde.org/show_bug.cgi?id=344849

--- Comment #8 from 4aa7f31e at opayq.com ---
I have read some parts of the PDF standard (ISO 32000-1:2008) and can only
confirm the assessment in the Sejda bug report (which has been closed in the
meantime).

According to section 7.9.2.2 "Text String Type" of ISO 32000-1:2008, fields
such as the "Author" field in the example document must be represented as a PDF
"text string", which can be encoded either as UTF16-BE with byte order mark or
as PDFDocEncoding. PDFDocEncoding can encode all Latin1 characters; however, it
is NOT the same as either ISO Latin1 or Windows-1252!

The mapping of PDFDocEncoding bytes to characters is defined in Annex D, table
D.2 "Latin Character Set and Encodings". Note that both PDFDocEncoding and
Windows-1252 can in fact encode the characters "–‰". Thus, the string need not
be encoded as UTB16-BE and the provided PDF document is valid (the characters
"–‰" are correctly encoded as "0x85 0x8B" in PDFDocEncoding). It seems that
Okular does not correctly parse PDFDocEncoded text strings.

(The other example document works correctly because U+2012 cannot be encoded in
PDFDocEncoding, so UTF16-BE was used, which is correctly read by Okular.)

-- 
You are receiving this mail because:
You are the assignee for the bug.