Review Request 109393: a new library for traversing odf files and a new export filter

Jarosław Staniek staniek at kde.org
Thu Mar 14 00:04:21 GMT 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/109393/#review29164
-----------------------------------------------------------


Thanks for the effort. That's my partial review. I started with simple notes related to coding style and what not but then went into the essential topics, and these are the most important. If you'll agree I'd try to help to make this export tool complementary to the tools designed for the OOXML import filters (what mostly means using stream-readers/writers: QXmlStreamReader/QXmlStreamWriter)


filters/libodftraverse/OdfParser.h
<http://git.reviewboard.kde.org/r/109393/#comment21767>

    Doxygen comments would be better, everywhere.



filters/libodftraverse/OdfParser.h
<http://git.reviewboard.kde.org/r/109393/#comment21771>

    How about Ko prefix?



filters/libodftraverse/OdfParser.h
<http://git.reviewboard.kde.org/r/109393/#comment21768>

    For any new API let's recommend IN-OUT parameters passed via pointers (Qt API guildeline adopted by some of KDE). This comment is also about consistency - you passed odfStore here via pointer.



filters/libodftraverse/OdfParser.h
<http://git.reviewboard.kde.org/r/109393/#comment21769>

    same note as for line 55



filters/libodftraverse/OdfParser.cpp
<http://git.reviewboard.kde.org/r/109393/#comment21774>

    I wonder if the design is the best at all. The OOXML reader classes recursively check element names and perform translation for supported elements. No intermediate hash structures are filled, XML-stream-reading is performed as in pull-parser. They are confenient at first look, yes, but that's not the best approach performance-wise, and even wouldn't lead to noticeably smaller code. 
    
    When we convert FROM ODF, we can similarly translate on-demand without basically importing everything into memory.
    
    This way we would be able to import data very selectively and skip ignored data.
    
    I know the data for manifest, metadata and settings may be quite small, but styles would be already quite big. Moreover QHash<QString, QString> doesn't seem to model what styles are (there are various levels of styles in ODP for example). What container would you use for them?



filters/libodftraverse/OdfParser.cpp
<http://git.reviewboard.kde.org/r/109393/#comment21773>

    KDElibs4's kDebug() does not need extra spaces



filters/libodftraverse/OdtTraverser.h
<http://git.reviewboard.kde.org/r/109393/#comment21775>

    Not sure but how about OdtContentIterator?
    
    Compare http://en.wikipedia.org/wiki/XML#Pull_parsing



filters/libodftraverse/OdtTraverser.cpp
<http://git.reviewboard.kde.org/r/109393/#comment21777>

    begin*(), end() are all like SAX events.
    
    Compared to pull-parser, this API forces to use three separate blocks: 
    1. beginTag*() - the backend implementation would want to create some data structures (DT1) here
    2. handleInsideElementsTag() can call the same beginTag*() so there can be recursion, if so we have to protect the DT1 on a stack (with pull parsers we did not have this problem)
    3. endTag*() needs to somehow find the data structure and actually bin it to the rest of overall processing result, be it an in-memory strcture or simply an output stream.
    
    Unless I completely misunderstood, I feel we need to take a step back.
    I'd like to see as much as possible helper code for ODF reading _shared_ but in the same time the API would be on top of stream-reader, and for stream-writing.
    
    Please look at the key sentences from [http://en.wikipedia.org/wiki/XML#Pull_parsing]: "The recursive-descent approach tends to lend itself to keeping data as typed local variables in the code doing the parsing, while SAX, for instance, typically requires a parser to manually maintain intermediate data within a stack of elements which are parent elements of the element being parsed. Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code."
    
    At this point, caffeinated at night, I am interrupting the review, hoping we can find best possible solutions ;)


- Jarosław Staniek


On March 13, 2013, 7:24 p.m., Inge Wallin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/109393/
> -----------------------------------------------------------
> 
> (Updated March 13, 2013, 7:24 p.m.)
> 
> 
> Review request for Calligra.
> 
> 
> Description
> -------
> 
> This patch creates a new library in filters/odftraverse. The purpose of this library is to create classes to make it easier to create export filters for ODF files. What you do to use this library is that you inherit a base class for backends to this parser / traverser and in the many callbacks you write the output that is relevant for your output format.
> 
> To show how it can be used I have also created a very simple proof of concept filter that exports to text format, something Calligra actually lacked before.
> 
> The current implementation traverses only ODT files and there are still a number of NYI functions that I want to finish before the actual merge. But I thought I'd get some opinions early. In other words, I expect at least one, maybe two iterations before this branch can be merged.
> 
> 
> Diffs
> -----
> 
>   filters/CMakeLists.txt 8bcd640 
>   filters/libodftraverse/CMakeLists.txt PRE-CREATION 
>   filters/libodftraverse/OdfParser.h PRE-CREATION 
>   filters/libodftraverse/OdfParser.cpp PRE-CREATION 
>   filters/libodftraverse/OdtTraverser.h PRE-CREATION 
>   filters/libodftraverse/OdtTraverser.cpp PRE-CREATION 
>   filters/libodftraverse/OdtTraverserBackend.h PRE-CREATION 
>   filters/libodftraverse/OdtTraverserBackend.cpp PRE-CREATION 
>   filters/words/ascii/AsciiExport.h PRE-CREATION 
>   filters/words/ascii/AsciiExport.cpp PRE-CREATION 
>   filters/words/ascii/CMakeLists.txt d36de47 
>   filters/words/ascii/OdtTraverserAsciiBackend.h PRE-CREATION 
>   filters/words/ascii/OdtTraverserAsciiBackend.cpp PRE-CREATION 
>   filters/words/ascii/TODO ceb1a24 
>   filters/words/ascii/words_ascii_export.desktop PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/109393/diff/
> 
> 
> Testing
> -------
> 
> Tested with a lengthy text file.
> 
> 
> Thanks,
> 
> Inge Wallin
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130314/562600ee/attachment.htm>


More information about the calligra-devel mailing list