[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

Sun May 12 13:38:16 BST 2019

https://bugs.kde.org/show_bug.cgi?id=407133

--- Comment #1 from David Hurka <david.hurka at mailbox.org> ---
Created attachment 120007
  --> https://bugs.kde.org/attachment.cgi?id=120007&action=edit
Vertical texts are used for diagrams, but Okular can’t search for them

You can fix the clipboard content with the following command ;)
perl -e 'print reverse split //, <>;'

Seems like the TextPage, which is used for search and text-copying, is filled
this way. While the Generator adds horizontal words as words, vertical words
are split into letters. Then, Okular thinks, that the uppermost letter is the
first letter.

Letters or words are stored in TextEntity objects in the TextPage. The
TextEntity stores the letter/word as string and the bounding rectangle.

The problem is one of these two: (choose what you like more)
1. TextPage and TextEntity can’t store transformations, or even simple
rotation. So, the generator splits vertical words into single letters. *1
2. The generator, which uses poppler to read the pdf, gets vertical words
already split into letters.

*1) Possible reason: this way, one can (theoretically *2) use the Text
Selection tool to select the word.
*2) Practically not, because Okular adds any other letter on the same height to
the selection.

I have attached a screenshot which illustrates the practical relevance of this
problem: In many datasheets (not only TI), vertical text is used to describe
vertical axes of diagrams. Splitting them into words prevents searching for a
specific diagram.

-- 
You are receiving this mail because:
You are the assignee for the bug.