[okular] [Bug 418520] Find function misses occurrences of target string that wrap from one line to next line of document text.

avlas bugzilla_noreply at kde.org
Sun Mar 29 20:10:05 BST 2020


https://bugs.kde.org/show_bug.cgi?id=418520

--- Comment #11 from avlas <jsardid at gmail.com> ---
(In reply to David Hurka from comment #10)
> > I assume there is no simple heuristic to workaround these
> > wrongly formatted pdfs, which highly affect features such
> > as searching, highlighting and selecting/extracting text.
> 
> It’s that TextEntity reordering thing.
> 
> @avlas Can you search for
> 
>     will overshadowing would apply
> 
> (in the Thumbnails panel, not in the search bar), so we can see the geometry
> of the TextEntity objects? If the words are cleary separated between the
> columns, its a problem with Okular.
> 
> Okular breaks the document appart in single letters, and then reorders them
> based on their positions. It uses XY-Cut to separate colums, so it needs
> some horizontal space between them. Thats pretty useful for many PDFs which
> are arround in the web (like MeanWell datasheets...), but sometimes doesn’t
> work.
> 
> It looks like it’s a scanned paper. If it isn’t aligned perfectly vertical,
> the columns overlap, and XY-Cut fails.
> 
> https://phabricator.kde.org/source/okular/browse/master/core/textpage.cpp;
> 9694113a961cb5a5d6ef18ce0beeaa975a8c6db3$1890 if you are interested...
> 
> Of course it may still be a problem with the PDF. To check that, you can
> open it in e. g. Firefox and select some text.

Please see:

https://i.imgur.com/OV7BLRx.png

I checked it in Chromium and seems to work fine. Please see the previous
example when typing "circumstances":

https://i.imgur.com/8vn1Kpp.png

This is an official paper from a journal that I downloaded, but the paper is
from 1975, so not sure about the underlying technicalities of the pdf. Yet,
text management seems to work just fine (selecting, highlighting, etc). All
that does not consider line breaks and columns, which fail in okular but seem
to work just fine in chromium. So it might be the heuristic in okular compared
to that in chromium, perhaps.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Okular-devel mailing list