[okular] [Bug 395660] okular cannot preserve annotations in some pdf files.

Mon Jun 25 21:05:41 UTC 2018

https://bugs.kde.org/show_bug.cgi?id=395660

--- Comment #7 from Tobias Deiminger <haxtibal at posteo.de> ---
(In reply to Albert Astals Cid from comment #6)
> I think the important question is, does Adobe Reader let you save stuff in
> that broken file?
Yes, Adobe Reader can save annotations in '1_PDFsam_Untitled 1.pdf'. Okular can
view the saved file afterwards. Details see below.

> If so we should try to do the same, and if we can't make
> it happen i guess we'd need some kind of visual warning (we have one in the
> command line when saving fails, but that's hardly enough)
Nothing is impossible:) I'd take it as learning story, with open end and no
guarantees. As this may take a looooong time, let's better add the visual
warning as interim solution. Or are there some experienced poppler guys out
there to join? 

Some details.

On full rewrite ("Save As..."), Adobe Reader created a new XRef stream for
objects 0..13. So there was an object 0 after save.

On incremental update ("Save"), Adobe Reader instead added a new XRef stream
with /Index[2 2 6 1 18 11] to the end of the file.
The original XRef stream with /Index [1 17] was preserved. In that case there
was still no object 0 after save.

The content of the full rewrite XRef looked as follows
$ dd if='1_PDFsam_Untitled 1.pdf' ibs=1 skip=12306 count=52 |
./unpredict_png.py | hexdump -e '4/1 " %02X" "\n"'
 00 00 00 00 # obj 0 free, next free object = 0, use gen 0 if reused
 01 1D FB 00
 01 20 D8 00
 01 2D 8A 00
 01 2E 59 00
 01 2F 3E 00
 02 00 01 00
 02 00 01 01
 02 00 01 02
 02 00 01 03
 02 00 03 00
 02 00 03 01
 02 00 03 02
 02 00 04 00

Adobe saves the stream with /DecodeParms<</Columns 4/Predictor 12>>
/Filter/FlateDecode.
So to analyze it, one has to decode and unpredict the PNG prediction first. I
used this quick and dirty python script:

Listing unpredict_png.py

#!/usr/bin/python3
import zlib
import sys
predicted = zlib.decompress(sys.stdin.buffer.read())
rows = [predicted[i+1:i+5] for i in range(0, len(predicted), 5)]
prev = bytearray(4)
for row in range(len(rows)):
    for byte in range(len(rows[row])):
        prev[byte] = (rows[row][byte] + prev[byte]) & 0xFF
    sys.stdout.buffer.write(prev)

-- 
You are receiving this mail because:
You are the assignee for the bug.