[Okular-devel] [Bug 161324] recognise columns in the text of a page

Robert Knight robertknight at gmail.com
Fri Sep 17 11:21:57 CEST 2010


https://bugs.kde.org/show_bug.cgi?id=161324





--- Comment #37 from Robert Knight <robertknight gmail com>  2010-09-17 11:21:53 ---
> Does poppler guess the text layout using some generic heuristic algorithm, or
> use some explicit information on text ordering embedded in the PDF format?

PDFs do not contain layout information about how text is structured into
paragraphs and columns.  As I understand it, what PDF provides is essentially a
list of commands that say "draw string S at position P with font F".

I haven't looked into recent versions of Poppler but older versions had some
fairly complex heuristic algorithms to try to piece together the layout given
the input.  These algorithms had some interesting flaws.  If I remember
correctly, due to numerical instability the order of paragraphs in the output
text could differ significantly depending on the processor on which you ran the
code.

-- 
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the Okular-devel mailing list