Patch for msoscheme

Uzak Matus matus.uzak at ixonos.com
Tue Jan 10 17:31:58 GMT 2012


Hi again,

>2. I can send you one more file with same kind of structure which Stage is not able to parse and open.  Come  >on.. I have seen structures in mso.xml which are not compliant with MS-ODRAW spec :)

Yes, that's true.  I just don't want you to waste time playing with workarounds for the containers order.  Especially in such a corrupt file.

The order of containers MUST strictly follow the corresponding MS-* specification and the ABNF  grammar (specified in [RFC5234]) referenced at many places therein.  However real world documents produced by older versions of MS Office and other office suites do NOT follow the rules.   I have a bunch of documents like yours, which require us to look for a specific container at some other place or contain unknown containers.  The correct approach would be to re-design the parser to NOT expect the containers in the predefined order.

I already pointed to the fact that mso.xml is becoming messy because unknown/known containers are added to places where they occurred in a single test file.

Please upload all your test files to the calligratests repository so we can test for regressions later.  I'm using the corresponding folder in calligratests/import.  Maybe we should have a separate corrupt_files folder.

>I guess our final goal is to make Stage as robust as possible.
Yes, our goal is to have robust filters by improving POLE and msoscheme first.  Then we have to improve the filters and relax a bit checks against invalid files in the parser.

br,

-matus

--
Matus Uzak
Software Designer
Ixonos Slovakia s.r.o.
Sturova 27, 040 01 Kosice, Slovakia
mobile 0421 918 718 958
email: matus.uzak at ixonos.com
http://www.ixonos.com
________________________________
From: calligra-devel-bounces at kde.org [calligra-devel-bounces at kde.org] on behalf of Mani N C [maninc at gmail.com]
Sent: Tuesday, January 10, 2012 6:22 AM
To: Calligra Suite developers and users mailing list
Subject: Re: Patch for msoscheme

Hi Matus,

Thanks for reviewing my patch.

I got this file in internet and saved it in my Google Docs folder and downloaded it from Google docs. I did not try out this file with 2007 but in 2010  was able to open this file with some extra blank slides added to it. BTW LibreOffice  opened this file without any error.

1. I first tried to add TextContainerInteractiveInfo to TextContainer, but this did fix the parsing issue. So I moved it to Parent structure, similar to what we have done for TextRulerAtom. I do agree, It should have been TextContainerInteractiveInfo instead of  MouseClickTextInfo. Since this was with some corner cases I thought MouseClickTextInfo would be sufficient.

2. I can send you one more file with same kind of structure which Stage is not able to parse and open.  Come on.. I have seen structures in mso.xml which are not compliant with MS-ODRAW spec :)

I guess our final goal is to make Stage as robust as possible. We can throw an error saying this is not as per specs but finally we should open the document. While saving a document we should stick to the specification but we should be flexible while reading it.

If you have a better fix, feel free to update my patch or let me know I will fix it.

Thanks & Regards,
Mani

On Mon, Jan 9, 2012 at 10:22 PM, Uzak Matus <matus.uzak at ixonos.com<mailto:matus.uzak at ixonos.com>> wrote:
Hi Mani,

I'm not happy about that patch (check the reasons below).  Do you know which application produced that file?

PowerPoint 2007 classified your test file as corrupt and displayed only a number of slides.  PowerPoint 2003 did not complain (it usually does not complain ) but I don't know if the file was displayed properly.  After re-save in PowerPoint 2003 the file was displayed identically by Stage.

At the moment both the parser and the filter are not prepared to handle such files and I would prefer to revert such a change from msoscheme.

Reasons:
1.  We invented a number of own structures the keep related information together and make it logically sound to have the filter as readable as possible.  Each of the choices defined therein arrives in a specific scenario or is used by a specific version of PowerPoint.  You mixed MouseClickTextInfo into data which specify padding and indent and that is logically unsound.

<struct name="TextClientDataSubContainerOrAtom">
               <choice name="anon">
                       <type type="OutlineTextRefAtom" />
                       <type type="TextContainer" />
                       <type type="TextRulerAtom" />
+                       <type type="MouseClickTextInfo" />
               </choice>
</struct>


2.  You introduced the following child of OfficeArtSpContainer :

               <type name="shapeTertiaryOptions2" type="OfficeArtTertiaryFOPT"
                       optional="true" />
+               <type name="shapePrimaryOptions2" type="OfficeArtFOPT" optional="true" />

This is not compliant with the MS-ODRAW specification, there are no shapePrimaryOptions2!  From my experience MS-ODRAW containers always follow the Primary - Secondary - Tertiary order, so expecting shapePrimaryOptions2 to be saved after shapeSecondaryOptions2 and shapeTertiaryOptions2 is wrong.

--
Matus Uzak
Software Designer
Ixonos Slovakia s.r.o.
Sturova 27, 040 01 Kosice, Slovakia
mobile 0421 918 718<tel:0421%20918%20718> 958
email: matus.uzak at ixonos.com<mailto:matus.uzak at ixonos.com>
http://www.ixonos.com

________________________________________
From: calligra-devel-bounces at kde.org<mailto:calligra-devel-bounces at kde.org> [calligra-devel-bounces at kde.org<mailto:calligra-devel-bounces at kde.org>] on behalf of Jos van den Oever [jos at vandenoever.info<mailto:jos at vandenoever.info>]
Sent: Tuesday, January 03, 2012 1:17 PM
To: Calligra Suite developers and users mailing list
Subject: Re: Patch for msoscheme

On Tuesday, January 03, 2012 12:43:18 PM Mani N C wrote:
> Hi Jos,
>
> mso-patch for mso.xml will allow filters to parse the attached ppt file.
> Though lot of style information are still missing, I can atleast view the
> file.
> I have tested calligrastage with couple of other files and it works
> fine. If the patch is good enough, I will update Stage filter with this
> patch.

Thank you for the patch. It looks good and applies and compiles fine. I've
pushed it to msoscheme and calligra.

Only after pushing i saw that you have a branch on gitorious i could have
pulled from.

http://gitorious.org/msoscheme/msoscheme/commit/2b6d38010f1953ee96be087f4ec3e428ff2a1c06

Cheers,
Jos
_______________________________________________
calligra-devel mailing list
calligra-devel at kde.org<mailto:calligra-devel at kde.org>
https://mail.kde.org/mailman/listinfo/calligra-devel

_______________________________________________
calligra-devel mailing list
calligra-devel at kde.org<mailto:calligra-devel at kde.org>
https://mail.kde.org/mailman/listinfo/calligra-devel



--
Mani Chandrasekar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20120110/2e2ecd01/attachment.htm>


More information about the calligra-devel mailing list