[okular] [Bug 418520] Find function misses occurrences of target string that wrap from one line to next line of document text.

David Hurka bugzilla_noreply at kde.org
Sun Mar 29 19:51:20 BST 2020


https://bugs.kde.org/show_bug.cgi?id=418520

--- Comment #10 from David Hurka <david.hurka at mailbox.org> ---
> I assume there is no simple heuristic to workaround these
> wrongly formatted pdfs, which highly affect features such
> as searching, highlighting and selecting/extracting text.

It’s that TextEntity reordering thing.

@avlas Can you search for

    will overshadowing would apply

(in the Thumbnails panel, not in the search bar), so we can see the geometry of
the TextEntity objects? If the words are cleary separated between the columns,
its a problem with Okular.

Okular breaks the document appart in single letters, and then reorders them
based on their positions. It uses XY-Cut to separate colums, so it needs some
horizontal space between them. Thats pretty useful for many PDFs which are
arround in the web (like MeanWell datasheets...), but sometimes doesn’t work.

It looks like it’s a scanned paper. If it isn’t aligned perfectly vertical, the
columns overlap, and XY-Cut fails.

https://phabricator.kde.org/source/okular/browse/master/core/textpage.cpp;9694113a961cb5a5d6ef18ce0beeaa975a8c6db3$1890
if you are interested...

Of course it may still be a problem with the PDF. To check that, you can open
it in e. g. Firefox and select some text.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Okular-devel mailing list