How to get all characters on a page in Words and their position

Friedrich W. H. Kossebau kossebau at kde.org
Tue Sep 30 13:20:05 BST 2014


Hi Pierre,

Am Dienstag, 30. September 2014, 08:04:44 schrieb Pierre:
> On Monday, September 29, 2014 11:22:48 PM Friedrich W. H. Kossebau wrote:
> > I would like to create a list of all characters (visible) on a given page
> > and their position relativ to that page's borders.
> > 
> > How do I do that best?
> > 
> > Background:
> > As you might have seen I have pulled Sven's ODT generator for Okular from
> > an attic branch and pushed it next to the ODP generator. Talked to the
> > Okular people at Akademy and they are quite happy about that, as it will,
> > once released, also meet some bigger request for support of DOC(X) in
> > Okular. Most features like navigation-by-toc already work, but at least
> > one important thing is still missing:
> > selection of text for copying.
> > 
> > Due to Okular being started around PDFs this is done by an interface to
> > the
> > generator which exports the text as described above, as a list of chars
> > and
> > their position. So no native selection done by the generator, even if that
> > could provide better experience (surely someone is welcome to extend
> > Okular to also support native selection ;) ). See here for the API I need
> > to support:
> > 
> > http://api.kde.org/4.x-api/kdegraphics-apidocs/okular/html/classOkular_1_1
> > Text Page.html#a003032e4e1cd8c15f01ed639ce62d11f
> > 
> > So I start from
> > 
> >         KWPage page = pageManager->page(okularPage->number()+1);
> > 
> > and then how do I get all the text frames of that page and how do I best
> > calculate the distance of each char to the page borders?
> 
> Hi
> 
> Pages are sort of «pointers», empty shells. They are here to layout KoShape
> objects one after each other. So you have to get back to your shapeManager
> and use shapesAt(page->contentRect()).
> This will list you the shapes of the page, which you can then dynamically
> cast to TextShape objects, whose textShapeData contain a QTextDocument
> object. That should get you the text of a given page, as far as I remember.

Thanks. That sounds not too difficult.

> Regarding distance calculation, I don't really understand what you want to
> do. Do you want to be able to get, for any character, its position on the
> page ?

Yes. Sounds insane, but then this is how Okular needs the data, due to its 
internal abstractions.

Cheers
Friedrich



More information about the calligra-devel mailing list