How to get all characters on a page in Words and their position

C. Boemann cbo at boemann.dk
Tue Sep 30 13:02:49 BST 2014


On Tuesday 30 September 2014 08:04:44 Pierre wrote:
> On Monday, September 29, 2014 11:22:48 PM Friedrich W. H. Kossebau wrote:
> > Hi,
> > 
> > I would like to create a list of all characters (visible) on a given page
> > and their position relativ to that page's borders.
> > 
> > How do I do that best?
> > 
> > Background:
> > As you might have seen I have pulled Sven's ODT generator for Okular from
> > an attic branch and pushed it next to the ODP generator. Talked to the
> > Okular people at Akademy and they are quite happy about that, as it will,
> > once released, also meet some bigger request for support of DOC(X) in
> > Okular. Most features like navigation-by-toc already work, but at least
> > one important thing is still missing:
> > selection of text for copying.
> > 
> > Due to Okular being started around PDFs this is done by an interface to
> > the
> > generator which exports the text as described above, as a list of chars
> > and
> > their position. So no native selection done by the generator, even if that
> > could provide better experience (surely someone is welcome to extend
> > Okular to also support native selection ;) ). See here for the API I need
> > to support:
> > 
> > http://api.kde.org/4.x-api/kdegraphics-apidocs/okular/html/classOkular_1_1
> > Text Page.html#a003032e4e1cd8c15f01ed639ce62d11f
> > 
> > So I start from
> > 
> >         KWPage page = pageManager->page(okularPage->number()+1);
> > 
> > and then how do I get all the text frames of that page and how do I best
> > calculate the distance of each char to the page borders?
> 
> Hi
> 
> Pages are sort of «pointers», empty shells. They are here to layout KoShape
> objects one after each other. So you have to get back to your shapeManager
> and use shapesAt(page->contentRect()).
> This will list you the shapes of the page, which you can then dynamically
> cast to TextShape objects, whose textShapeData contain a QTextDocument
> object. That should get you the text of a given page, as far as I remember.
> Regarding distance calculation, I don't really understand what you want to
> do. Do you want to be able to get, for any character, its position on the
> page ?
> 
>  Pierre
correct - except the qtextdocument you get can span several pages, so for each 
textshape you need to use its KoTextLayoutArea and parse down through it 
(which isn't straight forward as in the text ABCD you can find that A and D is 
on your page but B and C might be on the previous page



More information about the calligra-devel mailing list