a new library for traversing odf files and a new export filter

Sebastian Sauer mail at dipe.org
Tue Mar 26 09:52:59 GMT 2013


On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
>> Hola,
>>
>> On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge at lysator.liu.se 
>> <mailto:inge at lysator.liu.se>> wrote:
>>
>>     On Monday, March 25, 2013 17:54:53 matus.uzak at gmail.com
>>     <mailto:matus.uzak at gmail.com> wrote:
>>     > Hi,
>>     >
>>     > sorry for not discussing earlier, but I did not have much free
>>     time last
>>     > two weeks.
>>     >
>>     > I think we should continue the parser type discussion in order
>>     to also
>>     > improve state of things in libmsooxml.  What we have there is a
>>     PULL
>>     > parser. And I identified the following problems (Would be cool
>>     is Lassi
>>     > could check those):
>>     >
>>     > 1. OOXML sometimes requires us to run the parser twice at one
>>     element in
>>     > order to first collect selected information required to convert
>>     the content
>>     > of child elements.
>>     >
>>     > 2. There are situations when conversion of the 1st child of the
>>     root
>>     > element requires information from the last child of the root
>>     element.
>>
>>     It would be interesting to see some examples of these two issues.
>>
>>
>> As an example : in pptx files, in slides,
>> there can be text which is specified to use theme color lt1
>>
>> Don't remember the exact syntax, but something like
>> <p>
>> <rPr "color" = "lt1"/>
>> <r>Hejsan</r>
>> </p>
>>
>> Then as the last element of that slide there may or may not be
>> <clrMap "lt1" = "bg1" ...../> // or something similar
>>
>> Which means that lt1 should be interpreted to be bg1 for this 
>> particular slide.
>> Currently what we're doing is that we first read the slide once, 
>> skipping everything
>> except clrMap. Then we read the slide again (yay!) and start the real 
>> conversion.
>>
>> There was something similar in xlsx filters too if my memory serves 
>> me correctly.
>>
>
> See also somewhat related XmlWriteBuffer in 
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that 
> has to be written in advance is based on XML elements parsed later.  
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with 
> offset-references. Means something like (pseudo-xml):
>
> <style>
>   <item>index 0</index>
>   <item>index 1</index>
>   <item>index 2</index>
> </style>
>
> <content>
>   <content withStyleIndex="1"> // where 1 references to the second 
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it 
> needed to for-loop with counter over the <style> items all the time 
> they are referenced. Super expensive and iirc not caching is done (my 
> knowledge there is a few years old, so maybe that changed). A classic 
> case where someone just likes to introduce a "caching concept" to read 
> all the items at once, prepare them and access them later on direct by 
> index from a style-container/mnager. OOXML makes quit a lot of use of 
> such index-based references being a 1:1 port from C/C++ to XML.

Also somewhat related: Hard to say if caused by ugly design decisions 
alone or driven by XSLT limitations (would think both) but years ago 
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared 
during the OOXML ISO battle I investigated that code (for my diploma 
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre- 
and post processing, multiple xslt runs).

Code is still available at: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
The main converter lib: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
The xsl's: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/

It wasn't that bad but I can confirm Rob Weir's blog back then that the 
converter needs >10x longer then anything else and is a memory-monster.

>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel at kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130326/6b1fc1de/attachment.htm>


More information about the calligra-devel mailing list