D25256: Allow non-conforming LibreOffice PPT files to be imported

David Llewellyn-Jones noreply at phabricator.kde.org
Mon Nov 11 12:54:41 GMT 2019


davidllewellynjones created this revision.
davidllewellynjones added reviewers: pvuorela, dcaliste.
davidllewellynjones added a project: Calligra: 3.0.
Herald added a subscriber: Calligra-Devel-list.
davidllewellynjones requested review of this revision.

REVISION SUMMARY
  An apparent bug in the LibreOffice PPT exporter makes it output files which technically don't conform to the PPT specification. Calligra refuses to load these files, which although technically may be the correct behaviour, is extremely annoying for the user. LibreOffice's deviation from the PPT spec is pretty minor, and a slight weakening of Calligra's validation allows the files to be imported successfully.
  
  In more detail, when loading a drawing each text paragraph in the drawing has a TextPFRun structure ("A structure that specifies the paragraph-level formatting of a run of text"). This starts with a mask, followed by a sequence of fields. Only unmasked fields are included in the sequence.
  
  According to Section 2.9.45 of the PPT specification version 6 <https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-ppt>, the following fields must be masked out:
  
    masks.leftMargin
    masks.indent
    masks.defaultTabSize
    masks.tabStops
  
  In spite of this LibreOffice includes the `leftMargin` and `indent` fields (flags 0x100 and 0x400). I'm not familiar with the LibreOffice codebase, but it looks like this <https://cgit.freedesktop.org/libreoffice/core/tree/sd/source/filter/eppt/epptso.cxx?id=85d947d52d27b4ade68a735e23bd393bced26046#n709> is the problem code. From this same code it look like LibreOffice doesn't export the `defaultTabSize` or `tabStops` fields (which is correct).
  
  This patch loosens Calligra's validation to allow these flags to be set. I've tested this with a bunch of files which previously failed to load, including quite complex ones, and they all seem to load fine once the patch is applied.
  
  A couple of important notes.
  
  1. The validation code is generated by binschema <https://github.com/KDE/binschema/> and I'll submit a separate patch there.
  2. The `calligra/filters/libmso/generated/mso.jar` file also needs to be updated to this version <http://www.flypig.co.uk/dnload/dnload/other/mso.jar>, but I couldn't see a way to include the binary file in with this patch.

TEST PLAN
  1. Save out a file from LibreOffice in PPT format, or download this archive <http://www.flypig.co.uk/dnload/dnload/other/calligra-importppt.zip> with a test file inside.
  2. Attempt to load the file into Calligra Stage.
  3. Note that it refuses to load with the error "Invalid file format".
  4. Apply the patch.
  5. Attempt to load the same file again.
  6. Note that it loads correctly. If you used my test file, witness my amazing presentation design.

REPOSITORY
  R8 Calligra

REVISION DETAIL
  https://phabricator.kde.org/D25256

AFFECTED FILES
  filters/libmso/generated/simpleParser.cpp
  filters/libmso/generated/simpleParser.h

To: davidllewellynjones, pvuorela, dcaliste
Cc: Calligra-Devel-list, davidllewellynjones, dcaliste, cochise, vandenoever
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20191111/cee0dad4/attachment.htm>


More information about the calligra-devel mailing list