a new library for traversing odf files and a new export filter

Sebastian Sauer mail at dipe.org
Tue Mar 26 09:32:58 GMT 2013


On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
> Hola,
>
> On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge at lysator.liu.se 
> <mailto:inge at lysator.liu.se>> wrote:
>
>     On Monday, March 25, 2013 17:54:53 matus.uzak at gmail.com
>     <mailto:matus.uzak at gmail.com> wrote:
>     > Hi,
>     >
>     > sorry for not discussing earlier, but I did not have much free
>     time last
>     > two weeks.
>     >
>     > I think we should continue the parser type discussion in order
>     to also
>     > improve state of things in libmsooxml.  What we have there is a PULL
>     > parser. And I identified the following problems (Would be cool
>     is Lassi
>     > could check those):
>     >
>     > 1. OOXML sometimes requires us to run the parser twice at one
>     element in
>     > order to first collect selected information required to convert
>     the content
>     > of child elements.
>     >
>     > 2. There are situations when conversion of the 1st child of the root
>     > element requires information from the last child of the root
>     element.
>
>     It would be interesting to see some examples of these two issues.
>
>
> As an example : in pptx files, in slides,
> there can be text which is specified to use theme color lt1
>
> Don't remember the exact syntax, but something like
> <p>
> <rPr "color" = "lt1"/>
> <r>Hejsan</r>
> </p>
>
> Then as the last element of that slide there may or may not be
> <clrMap "lt1" = "bg1" ...../> // or something similar
>
> Which means that lt1 should be interpreted to be bg1 for this 
> particular slide.
> Currently what we're doing is that we first read the slide once, 
> skipping everything
> except clrMap. Then we read the slide again (yay!) and start the real 
> conversion.
>
> There was something similar in xlsx filters too if my memory serves me 
> correctly.
>

See also somewhat related XmlWriteBuffer in 
filters/libmsooxml/MsooXmlUtils.h which is used "when information that 
has to be written in advance is based on XML elements parsed later.  In 
such case the information cannot be saved in one pass" for OOXML=>ODF.

In the case of XSLT I also remember that there where a problem with 
offset-references. Means something like (pseudo-xml):

<style>
   <item>index 0</index>
   <item>index 1</index>
   <item>index 2</index>
</style>

<content>
   <content withStyleIndex="1"> // where 1 references to the second 
stlye-item
<content>

XSLT does iirc not allow such index-based reference-fetching making it 
needed to for-loop with counter over the <style> items all the time they 
are referenced. Super expensive and iirc not caching is done (my 
knowledge there is a few years old, so maybe that changed). A classic 
case where someone just likes to introduce a "caching concept" to read 
all the items at once, prepare them and access them later on direct by 
index from a style-container/mnager. OOXML makes quit a lot of use of 
such index-based references being a 1:1 port from C/C++ to XML.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130326/884658c5/attachment.htm>


More information about the calligra-devel mailing list