[okular] [Bug 402017] Cannot save PDF when loaded file has been deleted

David Hurka bugzilla_noreply at kde.org
Sat Jul 25 17:24:28 BST 2020


https://bugs.kde.org/show_bug.cgi?id=402017

--- Comment #44 from David Hurka <david.hurka at mailbox.org> ---
> Personally, i would suggest trying to figure out why poppler fails and
> fix it other than doing all strange things that need the user to read
> lots of stuff that has "if this and if that and not that"

I looked at what happens when Okular tries to save a *modified* *PDF* document.
In this case, the Generator is able to save the document including the changes
to a new file.

<investigation>

“->” will denote “function calls function”.
Poppler_qt5 will denote the Qt5 interface of Poppler, as Okular sees it.
Poppler will denote internal Poppler stuff.

1.1. Part requests to save the document to a local file:

Part::saveAs() -> ... -> PDFGenerator::save()

1.2. PDFGenerator::save() creates a new Poppler_qt5::PDFConverter from the
existing Poppler_qt5::Document, and uses it to save the document:

PDFGenerator::save() -> Poppler_qt5::Document::pdfConverter()->convert()

1.3. Poppler_qt5::PDFConverter::convert() creates a
Poppler::QIODeviceOutStream, which is a  Poppler::OutStream, on the local
output file. Then it lets the document be saved in this OutStream:

Poppler_qt5::PDFConverter::convert() -> Poppler::PDFDoc::saveAs() ->
Poppler::PDFDoc::saveIncrementalUpdate()

1.4. Poppler::PDFDoc::saveIncrementalUpdate() creates a copy of “str” (see
2.2), creating a new Poppler::FileStream.

1.5. Poppler::PDFDoc::saveIncrementalUpdate() first makes a verbatim copy from
this FileStream to the OutStream. Later it modifies the output file within the
OutStream to reflect the modifications.

So where does the input data Poppler::FileStream come from?

2.1. Document::openDocument() calls the PDFGenerator to open the PDF document
from a local file. (It can also pass the data in a QByteArray, but that is only
relevant when the document comes from stdin.)

Document::openDocument() -> ... -> PDFGenerator::loadDocumentWithPassword() ->
Poppler_qt5::Document::load() -> new Poppler::DocumentData() -> new
Poppler::PDFDoc() -> Poppler::GooFile::open() -> poppler/gfile.cc:
openFileDescriptor() -> open()

2.2. Poppler::PDFDoc::PDFDoc() makes the new GooFile accessible as
Poppler::FileStream, which is a Poppler::BaseStream, and stores it as “str”.

So what does reading from this Poppler::FileStream do?

3.1. As seen in 1.5, Poppler::PDFDoc::saveIncrementalUpdate() reads the whole
input file from the FileStream:

Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::getChar() ->
Poppler::FileStream::fillBuf() -> Poppler::GooFile::read() -> unistd.h: pread()

So what happens in 1.4, when the stream is copied?

4.1. Poppler::PDFDoc::saveIncrementalUpdate() -> Poppler::FileStream::copy() ->
new Poppler::FileStream()

4.2. copy() created a new stream on the same GooFile, but with an empty buffer.
This means, any first call to FileStream::getChar() will call pread() on
Poppler’s input file handle.

</investigation>

So there we are: The Poppler feature, which we already discovered as
stream-from-disk, is implemented by calling pread(). pread() relies on the
integrity of Poppler’s input file handle. This means Okular is responsible to
guarantee the integrity of the file handle. Since Okular passes a path to a
local file, we rely on the OS, which should not modify or delete the input file
while Poppler has its file handle on it.

This works fine in these cases:
A) Okular opens a remote file. ReadOnlyPart will make a temporary file as local
copy, so integrity is guaranteed.
B) Okular opens a persistent local file (e. g. ~/Mess/LM555.pdf). Integrity is
guaranteed as long as the user does not modify the file intentionally.

This works not fine in this case:
C) Okular opens a temporary local file, because Firefox told it to do so.
Integrity is not necessarily guaranteed, because Firefox will delete this file
soon. Before commit 559836c3, Okular let Poppler try to read the deleted file
through its still existing file handle. As seen in 4.2, this should work fine
if the system uses e. g. ext4.

This works not at all in this case:
D) Okular opens a file from a thumb drive, and later the thumb drive fails.
Because the data is absolutely unavailable now, Poppler can’t read any data
through its file handle .

Apparently we are mostly concerned about case C. How about letting PDFGenerator
try to save the file even if the source file is detected to be modified? If
PDFGenerator fails, we can inform the user that the PDF backend is to blame.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Okular-devel mailing list