a new library for traversing odf files and a new export filter

matus.uzak at gmail.com matus.uzak at gmail.com
Mon Mar 25 16:54:53 GMT 2013


Hi,

sorry for not discussing earlier, but I did not have much free time last
two weeks.

I think we should continue the parser type discussion in order to also
improve state of things in libmsooxml.  What we have there is a PULL
parser. And I identified the following problems (Would be cool is Lassi
could check those):

1. OOXML sometimes requires us to run the parser twice at one element in
order to first collect selected information required to convert the content
of child elements.

2. There are situations when conversion of the 1st child of the root
element requires information from the last child of the root element.

3. Interpretation of OOXML elements differs based on the namespace and that
happens in scope of one single filter implementation (The namespace is not
only limited to WordprocessingML, DrawingML and VML - that would be the
docx filter for example).  That forces us to maintain a context in order to
interpret attribute values properly.  There also might be totally different
child elements.  It's good that namespace is always checked, because that
avoids creation of invalid ODF, but it also ignores an element in an
unexpected namespace.

4. Variations of 1, 2 and 3.

It sounds like we need to adopt attributes of a SAX parser in order to
solve point 3.  And the code becomes a bit fluffy when we try to solve 1, 2
and 4, which is not an attribute of a PULL parser.

We will also need to fight with this when doing the ODF->OOXML conversion.
 As Inge wrote, the current plan is to export text and simple formatting
into DOCX.  But I'm afraid we will hit one of the problems soon.

I have also read comments from Jos about using XSLT to do the conversion.
 Do you think it would be easier to solve points 1,2,3 and 4 that way?
 When I imagine the code in XSLT using XPath, it could be Ok.  But not that
Ok in means of performance.

br,

Matus Uzak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130325/1d17bf37/attachment.htm>


More information about the calligra-devel mailing list