[calligra] filters: Fix bug 276268 - “Format not recongnized” Error is displaying while opening the attached Excel 2010 file

Marijn Kruisselbrink mkruisselbrink at kde.org
Wed Aug 17 15:14:33 BST 2011


And for documentation sake:
The real problem is that the whole relying on QUALIFIED_NAME (and certain 
hardcoded namespace prefixes) is completely broken by design. There are no 
elements without namespaces anywhere here, just elements with empty namespace 
prefixes, but because of the (broken) design of the ooxml filters, that causes 
major issues as it relies on certain hard-coded namespace prefixes for 
namespaces (and completely ignores any namespace definitions in the file).
So this is not about "tags without namespaces", it is about "tags with default 
namespaces", to try to prevent any future confusion when somebody actually 
tries to fix this for real.

Marijn

On Wednesday, August 17, 2011 04:08:54 PM Inge Wallin wrote:
> Git commit e09771b20ef359dac24b74a5b7d02da9f72135e3 by Inge Wallin.
> Committed on 17/08/2011 at 16:03.
> Pushed by ingwa into branch 'master'.
> 
> Fix bug 276268 -  “Format not recongnized” Error is displaying while
> opening the attached Excel 2010 file
> 
> This problem was much more complicated that it looked on the surface.
> 
> 1. The easy part was that MSO 2010 defines something called
>    AlternateContent that defines the same data in two different ways.
>    The fallback way is the same as SO 2007 uses, so that's what we try
>    to dig out.
> 
> 2. A more serious problem was that the QUALIFIED_NAME macro was (and
>    still is) defined in a way that makes it impossible to use
>    READ_PROLOGUE et al. in a file that handles both tags with and
>    without namespaces.  See details in MsooXmlCommonReaderImpl.h,
>    where the namespace mc starts.
> 
> Bug: 276268
> 
> M  +15   -0    filters/tables/xlsx/XlsxXmlWorksheetReader.cpp
> M  +51   -17   filters/libmsooxml/MsooXmlCommonReaderImpl.h
> 
> http://commits.kde.org/calligra/e09771b20ef359dac24b74a5b7d02da9f72135e3
> 
> diff --git a/filters/libmsooxml/MsooXmlCommonReaderImpl.h
> b/filters/libmsooxml/MsooXmlCommonReaderImpl.h index 080b99d..3070a4c
> 100644
> --- a/filters/libmsooxml/MsooXmlCommonReaderImpl.h
> +++ b/filters/libmsooxml/MsooXmlCommonReaderImpl.h
> @@ -55,7 +55,7 @@ KoFilter::ConversionStatus
> MSOOXML_CURRENT_CLASS::read_t() READ_PROLOGUE
>      while (!atEnd()) {
>          readNext();
> -        kDebug() << *this;
> +        //kDebug() << *this;
>          if (isCharacters()) {
>              body->addTextSpan(text().toString());
>  #ifdef PPTXXMLSLIDEREADER_CPP
> @@ -68,6 +68,22 @@ KoFilter::ConversionStatus
> MSOOXML_CURRENT_CLASS::read_t() //kDebug() << "{1}";
>      READ_EPILOGUE
>  }
> +
> +
> +// ----------------------------------------------------------------
> +//                     New namespace: mc
> +
> +// ARRRRRGH!
> +
> +// The way that READ_PROLOGUE is defined via QUALIFIED_NAME makes it
> +// impossible to use it in files that handle tags both with and
> +// without namespaces.  This means that we cannot use READ_PROLOGUE in
> +// the functions below, and most likely also not the READ_IF variants.
> +// The above is only true when called from XmlWorksheetReader.  For Docx,
> +// there are always namespaces, so it doesn't apply.
> +// Same is true for READ_EPILOGUE.
> +
> +
>  #undef MSOOXML_CURRENT_NS
>  #define MSOOXML_CURRENT_NS "mc"
> 
> @@ -76,23 +92,28 @@ KoFilter::ConversionStatus
> MSOOXML_CURRENT_CLASS::read_t() //! Alternate content handler
>  KoFilter::ConversionStatus MSOOXML_CURRENT_CLASS::read_AlternateContent()
>  {
> -    READ_PROLOGUE
> -
>      m_choiceAccepted = false;
> 
>      while (!atEnd()) {
>          readNext();
> -        BREAK_IF_END_OF(CURRENT_EL)
> +        if (isEndElement() && name() == "AlternateContent") {
> +            break;
> +        }
> +
>          if (isStartElement()) {
> -            TRY_READ_IF(Choice)
> +            if (name() == "Choice") {
> +                TRY_READ(Choice)
> +            }
>              else if (!m_choiceAccepted && qualifiedName() ==
> "mc:Fallback") { TRY_READ(Fallback)
>              }
> -            SKIP_UNKNOWN
> +            else {
> +                skipCurrentElement();
> +            }
>          }
>      }
> 
> -    READ_EPILOGUE
> +    return KoFilter::OK;
>  }
> 
>  #undef CURRENT_EL
> @@ -100,28 +121,34 @@ KoFilter::ConversionStatus
> MSOOXML_CURRENT_CLASS::read_AlternateContent() //! Choice handler
>  KoFilter::ConversionStatus MSOOXML_CURRENT_CLASS::read_Choice()
>  {
> -    READ_PROLOGUE
> +
>      const QXmlStreamAttributes attrs(attributes());
> 
>      TRY_READ_ATTR_WITHOUT_NS(Requires)
> 
> +    // 'Requires="v"' means that the contents of the Choice part
> +    // is VML, which we support (or something else we do support,
> +    // Lassi is not sure).  For all other alternatives we
> +    // don't dare try to interpret it, but instead we use the
> +    // AlternateContent which is what MSO 2007 would have given us.
>      if (Requires != "v") {
>          skipCurrentElement();
> -        READ_EPILOGUE
> +        return KoFilter::OK;
>      }
> -    m_choiceAccepted = true;
> 
> +    m_choiceAccepted = true;
>      while (!atEnd()) {
>          readNext();
> -        BREAK_IF_END_OF(CURRENT_EL)
> +        if (isEndElement() && name() == "Choice") {
> +            break;
> +        }
>          if (isStartElement()) {
>  #ifdef PPTXXMLSLIDEREADER_CPP
>              TRY_READ_IF_NS(p, oleObj)
>  #endif
>          }
>      }
> -
> -    READ_EPILOGUE
> +    return KoFilter::OK;
>  }
> 
>  #undef CURRENT_EL
> @@ -129,19 +156,26 @@ KoFilter::ConversionStatus
> MSOOXML_CURRENT_CLASS::read_Choice() //! Fallback handler
>  KoFilter::ConversionStatus MSOOXML_CURRENT_CLASS::read_Fallback()
>  {
> -    READ_PROLOGUE
> 
>      while (!atEnd()) {
>          readNext();
> -        BREAK_IF_END_OF(CURRENT_EL)
> +        if (isEndElement() && name() == "Fallback") {
> +            break;
> +        }
> +
>          if (isStartElement()) {
>  #ifdef DOCXXMLDOCREADER_H
>              TRY_READ_IF_NS(w, pict)
>  #endif
> +#ifdef XLSXXMLWORKSHEETREADER_CPP
> +            // FIXME: This Choice/Content/Fallback structure needs a more
> general treatment. +            if (name() == "oleObject") {
> +                TRY_READ(oleObject)
> +            }
> +#endif
>          }
>      }
> -
> -    READ_EPILOGUE
> +    return KoFilter::OK;
>  }
> 
>  #endif
> diff --git a/filters/tables/xlsx/XlsxXmlWorksheetReader.cpp
> b/filters/tables/xlsx/XlsxXmlWorksheetReader.cpp index 8269ec7..6470efe
> 100644
> --- a/filters/tables/xlsx/XlsxXmlWorksheetReader.cpp
> +++ b/filters/tables/xlsx/XlsxXmlWorksheetReader.cpp
> @@ -56,6 +56,8 @@
> 
>  #include "NumberFormatParser.h"
> 
> +#define XLSXXMLWORKSHEETREADER_CPP
> +
>  #define UNICODE_EUR 0x20AC
>  #define UNICODE_GBP 0x00A3
>  #define UNICODE_JPY 0x00A5
> @@ -68,8 +70,16 @@
> 
>  #include <math.h>
> 
> +// ----------------------------------------------------------------
> +// Include implementation of common tags
> +
>  #include <MsooXmlCommonReaderImpl.h> // this adds p, pPr, t, r, etc.
> 
> +#undef  MSOOXML_CURRENT_NS // tags without namespace
> +#define MSOOXML_CURRENT_NS
> +
> +// ----------------------------------------------------------------
> +
>  #define NO_DRAWINGML_NS
>  #define NO_DRAWINGML_PIC_NS // DrawingML/Picture
> 
> @@ -2154,6 +2164,11 @@ KoFilter::ConversionStatus
> XlsxXmlWorksheetReader::read_oleObjects() BREAK_IF_END_OF(CURRENT_EL)
>          if( isStartElement() ) {
>              TRY_READ_IF(oleObject)
> +            // It seems that MSO 2010 has a concept of Alternate
> +            // Content, which it throws in at unexpected times.
> +            // This is one such time.  So let's try to find the
> +            // oleObject inside an mc:AlternateContent tag if possible.
> +            ELSE_TRY_READ_IF_NS(mc, AlternateContent)   // Should be more
> specialized what we are looking for ELSE_WRONG_FORMAT
>          }
>      }



More information about the calligra-devel mailing list