Review Request 109887: Create a stream reader better suited for ODF than QXmlStreamReader

Inge Wallin inge at lysator.liu.se
Tue Apr 9 10:05:15 BST 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/109887/#review30755
-----------------------------------------------------------



libs/odf/KoXmlStreamReader.cpp
<http://git.reviewboard.kde.org/r/109887/#comment22881>

    This code is indeed slower than the normal QXmlStreamReader. Since it builds on top of it, it would be impossible not to be. But you shouldn't compare this class with the normal stream reader. You should compare QXmlStreamReader plus the code that calls it, i.e. the odf parser on top of it to this class plus the code that calls _it_.
    
    The purpose of this class is not primarily to make the code faster but to make it nicer. As I wrote in the description of the review request it will result in much easier to read code. See the description for more details.
    
    Regarding the plans, I have 3 terms:
    
    Short term, I want to use this for the docx export filter. After good feedback to the odf traversing library I realized that stream reading was a much better approach. I will port the odf traverser to use stream reading and use that for the filter.
    
    Medium term I think that we could port the KoXmlReader (DOM based) to KoXmlStreamReader. We would have to do extensive testing before doing that. This would let us remove the call to the extremely ugly fixNamespaces() function and therefore make it a lot faster.  Since KoXmlReader.setDocument() uses around 15% of the total parsing time I think we could get a nice speedup there.
    
    Long term it would be very nice if we could start to use stream reading in our loading code. But for that to happen we need to change the structure of loading so that we can combine DOM based loading with stream based. I don't have any particular plans but I think it will mature when the current classes are used more.


- Inge Wallin


On April 6, 2013, 3:56 p.m., Inge Wallin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/109887/
> -----------------------------------------------------------
> 
> (Updated April 6, 2013, 3:56 p.m.)
> 
> 
> Review request for Calligra, Jarosław Staniek and Jos van den Oever.
> 
> 
> Description
> -------
> 
> This patch contains a new XML stream reader based on the QXmlStreamReader that is better suited for ODF.
> 
> Much ODF parsing code in Calligra looks like:
> 
>   if (el.namespaceUri() == KoXml::fo && el.name == "border") { ... }
> 
> The reason for this complicated construction is that the prefix (the "fo" in "fo:border") is not unique but is declared at the beginning of each Xml file. Even though "fo" is the normal prefix there is no guarantee that it is  the same in every document. 
> 
> However, it is a very rare document where it is *not* the normal prefix, so what we want to do is to be able to write code like this:
> 
>   If (el.qualifiedName() == "fo:border") { ... }
> 
> and make the XML stream reader or dom tree rewrite the qualified name in the very rare cases that the prefix does not match what we want.
> 
> This is exactly what the KoXmlStreamReader does. It allows you to write easier and faster code while still be correct in the few cases where the prefixes are not the expected ones. It does this by letting the user enter the expected namespace declarations and then compare those to the actual namespace declarations in the document. If they match everything will be as fast as possible. If they don't, it will be slower but still correct.
> 
> As an extra feature I have allowed the user to declare some extra namespaces (see fixNamespace() in KoXmlReader.cpp). This will let documents created by old versions of OpenOffice.org be read even though they use other namespaces.
> 
> I have code that uses this file but that is not yet ready for review. I wanted to put this up early to get feedback while the rest of the yet unfinished code is maturing.
> 
> 
> Diffs
> -----
> 
>   libs/odf/CMakeLists.txt 3680486 
>   libs/odf/KoXmlStreamReader.h PRE-CREATION 
>   libs/odf/KoXmlStreamReader.cpp PRE-CREATION 
> 
> Diff: http://git.reviewboard.kde.org/r/109887/diff/
> 
> 
> Testing
> -------
> 
> Not much. I will do that when the code that uses this code is ready. This review is for getting feedback on the ideas and implementation details.
> 
> 
> Thanks,
> 
> Inge Wallin
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20130409/78efc14a/attachment.htm>


More information about the calligra-devel mailing list