[okular] [Bug 407133] Copy text from rotated pdf gives rubbish

David Hurka bugzilla_noreply at kde.org
Sun May 12 18:34:10 BST 2019


https://bugs.kde.org/show_bug.cgi?id=407133

--- Comment #2 from David Hurka <david.hurka at mailbox.org> ---
Created attachment 120017
  --> https://bugs.kde.org/attachment.cgi?id=120017&action=edit
Diagonal text is not recognized as line

Looking into core/textpage.cpp tells me that the generators just output
characters with their bounding rectangles. (These informations become
TinyTextEntitys.) There seems to be no information about orientation.

There are some functions in core/textpage.cpp, whose code I didn’t read yet:


removeSpace()
Claims to remove space, to make output from different generators uniform.

makeWordFromCharacters()
Claims to rearrange characters to words, using spaces to distinguish between
adjacent words. (But spaces are removed?)

makeAndSortLines()
Claims to look for adjacent words to make a line of them, and to sort the
lines.

calculateStatisticalInformation()
Claims to be able to distinguish between character spacing, word spacing, and
column spacing. Needed for multi-column layouts.

XYCutForBoudingBoxes()
Claims to apply the XY-cut algorithm, to seperate... something

addNecessarySpace()
Inserts the space that was probaby removed by removeSpace(), so selecting text
does not result in words that are squashed together.

TextPagePrivate::correctTextOrder()
Calls the above, statically declared functions.


Unfortunately, these functions don’t seem to be designed for vertical text.
Even slightly diagonal text causes problems, see screenshot. (Possible reasons:
XY-cut can’t “see” diagonal texts, makeAndSortLines() collects characters in a
bad order)

There are many commits on these functions, mainly done in 2011 by Albert Astals
Cid and Mohammad Mahfuzur Rahman Mamun. The beginning was probably this commit?

> commit 2eb5f270fd4befb6a84ff2e9bdd921271930e046
> Author: Mohammad Mahfuzur Rahman Mamun <mamun.nightcrawelr at gmail.com>
> Date:   Mon Jun 27 19:58:24 2011 +0600
> 
>     three functions added in textpage
> 
> [snip a lot]

Maybe these two people can give more information on how vertical text is
supposed to be handled.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the Okular-devel mailing list