How to get all characters on a page in Words and their position

Pierre pinaraf at pinaraf.info
Tue Sep 30 07:04:44 BST 2014


On Monday, September 29, 2014 11:22:48 PM Friedrich W. H. Kossebau wrote:
> Hi,
> 
> I would like to create a list of all characters (visible) on a given page and
> their position relativ to that page's borders.
> 
> How do I do that best?
> 
> Background:
> As you might have seen I have pulled Sven's ODT generator for Okular from an
> attic branch and pushed it next to the ODP generator. Talked to the Okular
> people at Akademy and they are quite happy about that, as it will, once
> released, also meet some bigger request for support of DOC(X) in Okular. Most
> features like navigation-by-toc already work, but at least one important thing
> is still missing:
> selection of text for copying.
> 
> Due to Okular being started around PDFs this is done by an interface to the
> generator which exports the text as described above, as a list of chars and
> their position. So no native selection done by the generator, even if that
> could provide better experience (surely someone is welcome to extend Okular to
> also support native selection ;) ). See here for the API I need to support:
> 
> http://api.kde.org/4.x-api/kdegraphics-apidocs/okular/html/classOkular_1_1Text
> Page.html#a003032e4e1cd8c15f01ed639ce62d11f
> 
> So I start from
>         KWPage page = pageManager->page(okularPage->number()+1);
> and then how do I get all the text frames of that page and how do I best
> calculate the distance of each char to the page borders?

Hi

Pages are sort of «pointers», empty shells. They are here to layout KoShape 
objects one after each other. So you have to get back to your shapeManager and 
use shapesAt(page->contentRect()).
This will list you the shapes of the page, which you can then dynamically cast 
to TextShape objects, whose textShapeData contain a QTextDocument object.
That should get you the text of a given page, as far as I remember.
Regarding distance calculation, I don't really understand what you want to do. 
Do you want to be able to get, for any character, its position on the page ?

 Pierre
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 213 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20140930/82d44704/attachment.sig>


More information about the calligra-devel mailing list