a new library for traversing odf files and a new export filter
Sebastian Sauer
mail at dipe.org
Tue Mar 26 09:52:59 GMT 2013
On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
>> Hola,
>>
>> On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge at lysator.liu.se
>> <mailto:inge at lysator.liu.se>> wrote:
>>
>> On Monday, March 25, 2013 17:54:53 matus.uzak at gmail.com
>> <mailto:matus.uzak at gmail.com> wrote:
>> > Hi,
>> >
>> > sorry for not discussing earlier, but I did not have much free
>> time last
>> > two weeks.
>> >
>> > I think we should continue the parser type discussion in order
>> to also
>> > improve state of things in libmsooxml. What we have there is a
>> PULL
>> > parser. And I identified the following problems (Would be cool
>> is Lassi
>> > could check those):
>> >
>> > 1. OOXML sometimes requires us to run the parser twice at one
>> element in
>> > order to first collect selected information required to convert
>> the content
>> > of child elements.
>> >
>> > 2. There are situations when conversion of the 1st child of the
>> root
>> > element requires information from the last child of the root
>> element.
>>
>> It would be interesting to see some examples of these two issues.
>>
>>
>> As an example : in pptx files, in slides,
>> there can be text which is specified to use theme color lt1
>>
>> Don't remember the exact syntax, but something like
>> <p>
>> <rPr "color" = "lt1"/>
>> <r>Hejsan</r>
>> </p>
>>
>> Then as the last element of that slide there may or may not be
>> <clrMap "lt1" = "bg1" ...../> // or something similar
>>
>> Which means that lt1 should be interpreted to be bg1 for this
>> particular slide.
>> Currently what we're doing is that we first read the slide once,
>> skipping everything
>> except clrMap. Then we read the slide again (yay!) and start the real
>> conversion.
>>
>> There was something similar in xlsx filters too if my memory serves
>> me correctly.
>>
>
> See also somewhat related XmlWriteBuffer in
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that
> has to be written in advance is based on XML elements parsed later.
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with
> offset-references. Means something like (pseudo-xml):
>
> <style>
> <item>index 0</index>
> <item>index 1</index>
> <item>index 2</index>
> </style>
>
> <content>
> <content withStyleIndex="1"> // where 1 references to the second
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it
> needed to for-loop with counter over the <style> items all the time
> they are referenced. Super expensive and iirc not caching is done (my
> knowledge there is a few years old, so maybe that changed). A classic
> case where someone just likes to introduce a "caching concept" to read
> all the items at once, prepare them and access them later on direct by
> index from a style-container/mnager. OOXML makes quit a lot of use of
> such index-based references being a 1:1 port from C/C++ to XML.
Also somewhat related: Hard to say if caused by ugly design decisions
alone or driven by XSLT limitations (would think both) but years ago
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared
during the OOXML ISO battle I investigated that code (for my diploma
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre-
and post processing, multiple xslt runs).
Code is still available at:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
The main converter lib:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
The xsl's:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/
It wasn't that bad but I can confirm Rob Weir's blog back then that the
converter needs >10x longer then anything else and is a memory-monster.
>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel at kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130326/6b1fc1de/attachment.htm>
More information about the calligra-devel
mailing list