[okular] [Bug 331697] can't fill out pdf form

Michael Weghorn bugzilla_noreply at kde.org
Wed Oct 11 12:29:55 UTC 2017


https://bugs.kde.org/show_bug.cgi?id=331697

--- Comment #10 from Michael Weghorn <m.weghorn at posteo.de> ---
I analysed the problem with the attached PDF form. As far as I understand it so
far, the root cause is basically an "incorrect" PDF file, not Poppler.

For easier analysis in a text editor, I created a "cleaned" version of the
document using the command "mutool clean -d -a bug331697.pdf" which makes
binary streams being ASCII encoded and decompresses streams. The resulting PDF
document is attached as file "bug331697_MUTOOL_CLEANED.pdf".


The PDF specification
(https://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf)
describes how appearance streams for variable text must be created, s. p.
677ff; extract:

[Start quote]

"For non-rich text fields, the appearance stream—which, like all appearance
streams, is a form XObject—has the contents of its form dictionary initialized
as
follows:
• The resource dictionary ( Resources ) is created using resources from the
inter-
active form dictionary’s DR entry (see Table 8.67); see also implementation
note
118 in Appendix H.
• The lower-left corner of the bounding box ( BBox ) is set to coordinates (0,
0) in
the form coordinate system. The box’s top and right coordinates are taken from
the dimensions of the annotation rectangle (the Rect entry in the widget anno-
tation dictionary).
• All other entries in the appearance stream’s form dictionary are set to their
default values (see Section 4.9, “Form XObjects”).

[...]

The default appearance string ( DA ) contains any graphics state or text state
oper-
ators needed to establish the graphics state parameters, such as text size and
color,
for displaying the field’s variable text. Only operators that are allowed
within text
objects may occur in this string (see Figure 4.1 on page 197). At a minimum,
the
string must include a Tf (text font) operator along with its two operands, font
and
size . The specified font value must match a resource name in the Font entry of
the
default resource dictionary (referenced from the DR entry of the interactive
form
dictionary; see Table 8.67).

[End quote]


The corresponding object for the first form field in the ("cleaned") PDF file,
"Startbahnhof", is the following widget annotation:

~~~
417 0 obj
<<
  /DA (/Helvetica 10 Tf 0 g)
  /F 4
  /FT /Tx
  /Ff 12582912
  /MK 473 0 R
  /P 370 0 R
  /Rect [ 98.7881 466.621 430.018 482.513 ]
  /StructParent 5
  /Subtype /Widget
  /T (S1F4)
  /TU (Startbahnhof)
  /Type /Annot
  /V <>
>>
endobj
~~~

It contains a "DA" (default appearance) entry of "/Helvetica 10 Tf 0 g"

As described in the quote above, the "DR" entry from the interactive form
dictionary is used to initialize the resources in the appearance stream to be
constructed. The interactive form dictionary is the following object:

~~~
411 0 obj
<<
  /DA (/Helv 0 Tf 0 g )
  /DR <<
    /Encoding <<
      /PDFDocEncoding 91 0 R
    >>
    /Font <<
      /Helv 90 0 R
      /ZaDb 435 0 R
    >>
  >>
  /Fields [ 89 0 R 50 0 R 87 0 R 88 0 R 62 0 R 63 0 R 81 0 R 82 0 R
      66 0 R 67 0 R 40 0 R 41 0 R 68 0 R 414 0 R 415 0 R 416 0 R
      418 0 R 417 0 R 419 0 R 420 0 R 421 0 R 69 0 R 422 0 R 423 0 R
      424 0 R 425 0 R 426 0 R 51 0 R 52 0 R 70 0 R 64 0 R 43 0 R
      65 0 R 427 0 R 447 0 R 446 0 R 445 0 R 444 0 R 443 0 R 442 0 R
      441 0 R 440 0 R 439 0 R 438 0 R 437 0 R 436 0 R 434 0 R
      433 0 R 432 0 R 431 0 R 430 0 R 429 0 R 428 0 R 71 0 R 38 0 R ]
  /SigFlags 2
>>
endobj
~~~

The contained default resources ("DR" entry) do contain a font called
"Helvetica" as it is used in the "DA" entry of the form field (only one called
"Helv").
For that reason, the appearance stream is not "properly" created, which leads
to the text not being shown in the filled in form.

The interactive form dictionary also contains a default appearance ("DA")
entry. That one uses the font "Helv", which is specified in the resources. As
explained on p. 673 in the PDF spec, that (optional) "DA" serves as a
document-wide default value for the DA attribute of variable text fields.
However, since the text field has its own value, the default value is not used
for the form element in the given PDF file and the mismatch as described above
occurs.


For testing purposes, I removed the "DA" entry in the form field element for
"Startbahnhof" (line 11622) (and had "mutool" fix the xref afterwards). The
resulting PDF file is attached as "bug331697_removedDA.pdf".
With that modified file, the default DA entry specified in the interactive form
dictionary is used, the font name "Helv" used there does match and the
appearance stream is constructed as desired. The text is shown in Okular in the
filled in form as expected.


As far as I understand it so far, Poppler basically behaves in the way the PDF
specification tells it to. In order to still avoid problems with "broken" files
like the one given here, Poppler would probably have to implement some kind of
a workaround/fallback for cases where an undefined font name is being used.


I'd like to hear other opinions on what the best way to deal with such
situations is. Should Poppler implement some mechanism to deal with files as
the one given or not ("works as designed")? What could be a good approach?


I was notified of another PDF form where the user-visible result of filling in
the form is the same (inserted text not shown/printed), but the underlying
cause is a little different. In that document, the field element has its own
"DR" entry, which is ignored by Poppler (as suggested in implementation note
118 on p. 1118 of the PDF specification). The "DR" entry from the interactive
form dictionary is used instead (as defined in the PDF spec) which again leads
to the used font name not being defined.

That file is available at
http://www.muenchen.de/rathaus/dms/Home/Stadtverwaltung/Kreisverwaltungsreferat/fachspezifisch/HA-III/Dokumente/Kfz-Zulassung/SEPA_Mandat_V_1_2_weiden.pdf
and referenced from
https://www.muenchen.de/dienstleistungsfinder/muenchen/1064314/n0/. I attached
it to this bug report as well.


I would be very glad to get some guidance on what the best way to deal with
such cases would be (e.g. implement some special handling in Poppler, try to
make the authors provide fixed PDF files and close this bug as
"invalid"/"wontfix",...).


PS: For the specific file given in this bug report, there is a new version
available on the website of "Deutsche Bahn", which no longer shows the problem:
https://www.bahn.de/p/view/mdb/bahnintern/agb/befoerderungsbedingungen/fahrgastrechteformulare/2016/mdb_220024_160401_16-fahrgastrechte-formular_de.pdf,
referenced from
https://www.bahn.de/p/view/service/auskunft/fahrgastrechte/fahrgastrechte-formular.shtml,
attached to this bug report as well

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Okular-devel mailing list