[Bug 280772] Dr konqui appears when I open a directory

Wed Oct 5 22:52:58 BST 2011

https://bugs.kde.org/show_bug.cgi?id=280772

Sven Wehner <sven at atelophobia.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sven at atelophobia.de

--- Comment #4 from Sven Wehner <sven atelophobia de>  2011-10-05 21:52:57 ---
I have the same problem with a completely different PDF file (which is >80 MB,
so I won't attach it).

Some minor tests using the provided pdf file showed:
1. The file name doesn't matter. If I rename the file to "a.pdf" nothing
changes.
2. Your file results in the same crash for me.
3. The problem seems to be related to the file itself, because if you use
"pdftk Lantmateriforrattning.pdf output a.pdf" to read and write back the file,
the newly created file doesn't provoke the crash. (pdftk's man page calls this
"Repair a PDF's corrupted XREF table and stream lengths, if possible".)
4. The strangest thing happen when you apply the same behaviour to the
previously created file... it will provoke the crash again. Any further
iteration still provokes the crash. And every time the file changes... (which
could be a problem in pdftk)
5. The first iteration increases the file size from 17.3 KiB to 18.6 KiB, the
next iteration decreases the file size by 6 Byte, and any further iteration
doesn't change the file size. But nonetheless the SHA sums keep changing.
6. The metadata doesn't contain fancy unicode characters (in fact: "pdftk
Lantmateriforrattning.pdf dump_data_utf8 | enca -L none" gives you "7bit ASCII
characters")
7. The dumped data (pdftk ... dump_data ...) produces the same output for each
iteration.
8. A binary diff shows that the difference between the files is quite large.
Actually it seems like the complete pdf is rebuild.
9. If you run xmlindexer manually, it doesn't crash. But, it reports "Error in
parsing: Keyword obj not found". This is even reported for the non-crashing
version and other pdf files, that don't produce any kind of problem.
10. If you run kfilemetadatareader manually, it does crash.

The error comes from a std::string::assign() call, which throws a
std::length_error exception. This might happen if the size parameter is
negative.
The relevant source code seems to be located in file "lib/pdf/pdfparser.cpp",
function "PdfParser::parseName()", line 264. Especially the line
"lastName.assign(s, pos-s);" looks suspicious to me.
First, I guess the lines "skipNotFromString("()<>[]{}/%\t\n\f\r ", 16)" should
use 15, shouldn't they? Or even better, the functions wouldn't need manual
length parameters.
Second, isn't it possible that the StreamStatus r is "Eof"?
Third, parseName() uses skipNotFromString(), which uses checkForData(), which
uses read(), which uses stream->read(start, min, max). The documentation of
StreamBase::read() states: "@param start pointer passed by reference that will
be set to point to the retrieved array of items. If the end of the stream is
encountered or an error occurs, the value of @p start is undefined". Are you
sure that start is still valid? For instance, a start > pos would result in a
negative size for the assign() call. I think, a quick check for "pos-s > 0"
would be great, if there is not a better understanding of the problem itself...
Fourth, I don't fully understand what the code is doing :(

I saw that in revision 244e3949c8d1ef2c99119ca3ce6f18aa32199d3e Vishesh Handa
started writing a poppler based pdf parser. First, does it fully replace the
part that causes these troubles? And second, are you going to fix the old one
anyway?

-- 
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.