Style objects for applications and filters

Mon Apr 28 21:31:45 BST 2014

On Sunday, April 27, 2014 06:21:55 Thorsten Zachmann wrote:
> On Friday, April 25, 2014 12:50:15 PM Inge Wallin wrote:
> > I would like to have the opinion of the community for a certain issue. The
> > first version of the docx export filter is now merged and it's time to
> > take
> > the next step. But this next step involves a decision which needs the
> > input
> > from the rest of the Calligra community.
> > 
> > The docx filter uses stream reading to parse the odf file, using
> > KoXmlStreamReader in filters/libodf2, as opposed to our applications that
> > use DOM reading, using the KoXmlReader in libs/odf.
> 
> It is not 100% correct. The libs/odf also use stream reader but offers a DOM
> interface to use it.

Yes. I meant the user visible part of it all.  I guess it's logical that there 
is stream reading underneath. (In this case "user" means the code using the 
XML reader, not the person behind the keyboard.)

> > One thing that we will need is page layout properties, which has child
> > elements for columns among other things and this is where decision lies.
> > Page layout properties already has good support in a couple of objects in
> > libs/odf, among them KoPageLayout, KoColumns and KoBorder.
> > 
> > Those objects are designed to be fast and use binary representation of
> > their properties. On the other hand many of them are not complete, i.e.
> > they don't read, store and write all the properties that are defined in
> > the standard. And they don't support stream reading. The style objects in
> > filters/libodf2 are all designed to be complete if not super fast - yet.
> > 
> > So here is the question: When we move forward with the docx filter, should
> > 
> > we either:
> >  - add stream reading capabilities and maybe storage of properties
> > 
> > represented as text to the objects in libs/odf
> > 
> > or:
> >  - create new objects like the previous style objects in filters/libodf2
> 
> I'm not sure about that. If we don't support the stuff in calligra how would
> you like to support it in the filter. If we have support for the stuff  the
> object  we use should support it do I miss something important here?

Well, my personal opinion is that libs/odf should be a library for reading, 
storing and saving ODF entities. But when this was up for discussion last 
time, you yourself thought that libs/odf should only contain such objects that 
are used in the applications. The opinion then was that those objects or 
properties that are used in the filters only should be put under filters/libodf2 
instead. So at that time I moved everything to filters/libodf2 and that's where 
they are now.  The mail discussion in question had the subject "Purpose and 
scope of libs/odf" and the first mail was sent 16th of June 2013.

But I agree that duplication is not the way we should go forward.

> > The first one makes more sense to me since we should reduce duplication in
> > Calligra at large and this will also mean that the objects will be made
> > complete in relation to the attributes defined by the standard.  But it
> > may
> > also mean that not all attributes will be available in binary form through
> > named class methods.
> 
> Do you have an example on how that would look. From the textual description
> I'm not sure if that is what we should do. Using a binary represenation has
> a very big advantage and that is the memory used to store the suff compared
> to e.g. have a map of properties if there are a lot of them.
> 
> Maybe you can provide more details so there is something we can discuss
> about.

Sure.

There are two problems with the current style classes which parse the data 
into binary form:

1. They don't support everything in the standard. Most of them only parse part 
of the available attributes and the ones that are not implemented are simply 
lost.

2. They don't follow the general behaviour suggested by the standard to 
preserve and save back formatting properties that are not recognized by the 
application itself.

In the style classes in filters/libodf2 I try to fix this by simply storing the 
text properties in name/value form in a class named KoOdfStyleProperties which 
is a base class for classes like KoOdfTextProperties or 
KoOdfParagraphProperties. In this base class there is this simple definition:

  typedef  QHash<QString, QString>  AttributeSet;  // name, value

And there are also some simple getters and setters like:

    QString attribute(const QString &property) const;
    void    setAttribute(const QString &property, const QString &value);

So everything is handled in string form.

Some property types, like the paragraph properties, also define some more 
complex types like dropcaps which are also handled in string form:

  struct KoOdfStyleDropCap
  {
      AttributeSet attributes;
  };

Again, this means that all properties are read, stored and potentially written 
back. There is no way to miss anything.

But of course there are advantages to using binary representation instead. 
Speed is one, especially if there are complex objects that are accessed many 
times. Memory is consumption is another which you point out above.

The speed can be fixed by handling the binary representation as a form of 
cache. If (say) a line width is requested, we can parse this item and give it 
to the calling code in binary form. And of course also save this binary 
representation so that we don't have to parse it the next time.

The memory usage is not really fixable, but how bad is it? The paragraph-
properties is one of the biggest property sets. It has 53 defined properties in 
ODF 1.2, comprising almost exactly 1 KB. Note that this is how many attributes 
that are defined, i.e. the max number of actual attributes. Most actual styles 
will have far fewer. The values will generally be shorter than the names so 
let's assume that with overhead we have 2 KB, give or take a few bytes. Other 
estimations are that we will have perhaps 50 styles at most in a largish 
document with on average 3 property sets. So this will be 2 * 50 * 3 = 300 KB, 
where the content can easily be 10s of MB. Not that a big deal. And if it does 
become a problem we can always find a more compact storage representation than 
QString while still keep the API.

So here is a summary of what I want to do:
 * Represent styles as text attributes wherever possible so that they can be 
complete, i.e. not throw away style data.
 * Implement a binary interface for some (in the long run all) attributes so 
that we retain the advantages of that (mostly speed).
 * Because all our apps use DOM parsing now (even if stream reading is used 
under the hood) and many filters use stream parsing, I want to provide both 
variants of the loadOdf() method.
 * I want these style objects to be in libs/odf. In the long run, the current 
ones in kotext/styles should also move there but those are pretty complex and 
we need to be careful so that we don't introduce bugs.

In the short run I just want to avoid duplication and all patches will of 
course go through review.

I hope that this is what you were looking for.

	-Inge

> A nice weekend,
> 
> Thorsten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20140428/60e8a0c5/attachment.htm>