Style objects for applications and filters
Inge Wallin
inge at lysator.liu.se
Mon Apr 28 21:31:45 BST 2014
On Sunday, April 27, 2014 06:21:55 Thorsten Zachmann wrote:
> On Friday, April 25, 2014 12:50:15 PM Inge Wallin wrote:
> > I would like to have the opinion of the community for a certain issue. The
> > first version of the docx export filter is now merged and it's time to
> > take
> > the next step. But this next step involves a decision which needs the
> > input
> > from the rest of the Calligra community.
> >
> > The docx filter uses stream reading to parse the odf file, using
> > KoXmlStreamReader in filters/libodf2, as opposed to our applications that
> > use DOM reading, using the KoXmlReader in libs/odf.
>
> It is not 100% correct. The libs/odf also use stream reader but offers a DOM
> interface to use it.
Yes. I meant the user visible part of it all. I guess it's logical that there
is stream reading underneath. (In this case "user" means the code using the
XML reader, not the person behind the keyboard.)
> > One thing that we will need is page layout properties, which has child
> > elements for columns among other things and this is where decision lies.
> > Page layout properties already has good support in a couple of objects in
> > libs/odf, among them KoPageLayout, KoColumns and KoBorder.
> >
> > Those objects are designed to be fast and use binary representation of
> > their properties. On the other hand many of them are not complete, i.e.
> > they don't read, store and write all the properties that are defined in
> > the standard. And they don't support stream reading. The style objects in
> > filters/libodf2 are all designed to be complete if not super fast - yet.
> >
> > So here is the question: When we move forward with the docx filter, should
> >
> > we either:
> > - add stream reading capabilities and maybe storage of properties
> >
> > represented as text to the objects in libs/odf
> >
> > or:
> > - create new objects like the previous style objects in filters/libodf2
>
> I'm not sure about that. If we don't support the stuff in calligra how would
> you like to support it in the filter. If we have support for the stuff the
> object we use should support it do I miss something important here?
Well, my personal opinion is that libs/odf should be a library for reading,
storing and saving ODF entities. But when this was up for discussion last
time, you yourself thought that libs/odf should only contain such objects that
are used in the applications. The opinion then was that those objects or
properties that are used in the filters only should be put under filters/libodf2
instead. So at that time I moved everything to filters/libodf2 and that's where
they are now. The mail discussion in question had the subject "Purpose and
scope of libs/odf" and the first mail was sent 16th of June 2013.
But I agree that duplication is not the way we should go forward.
> > The first one makes more sense to me since we should reduce duplication in
> > Calligra at large and this will also mean that the objects will be made
> > complete in relation to the attributes defined by the standard. But it
> > may
> > also mean that not all attributes will be available in binary form through
> > named class methods.
>
> Do you have an example on how that would look. From the textual description
> I'm not sure if that is what we should do. Using a binary represenation has
> a very big advantage and that is the memory used to store the suff compared
> to e.g. have a map of properties if there are a lot of them.
>
> Maybe you can provide more details so there is something we can discuss
> about.
Sure.
There are two problems with the current style classes which parse the data
into binary form:
1. They don't support everything in the standard. Most of them only parse part
of the available attributes and the ones that are not implemented are simply
lost.
2. They don't follow the general behaviour suggested by the standard to
preserve and save back formatting properties that are not recognized by the
application itself.
In the style classes in filters/libodf2 I try to fix this by simply storing the
text properties in name/value form in a class named KoOdfStyleProperties which
is a base class for classes like KoOdfTextProperties or
KoOdfParagraphProperties. In this base class there is this simple definition:
typedef QHash<QString, QString> AttributeSet; // name, value
And there are also some simple getters and setters like:
QString attribute(const QString &property) const;
void setAttribute(const QString &property, const QString &value);
So everything is handled in string form.
Some property types, like the paragraph properties, also define some more
complex types like dropcaps which are also handled in string form:
struct KoOdfStyleDropCap
{
AttributeSet attributes;
};
Again, this means that all properties are read, stored and potentially written
back. There is no way to miss anything.
But of course there are advantages to using binary representation instead.
Speed is one, especially if there are complex objects that are accessed many
times. Memory is consumption is another which you point out above.
The speed can be fixed by handling the binary representation as a form of
cache. If (say) a line width is requested, we can parse this item and give it
to the calling code in binary form. And of course also save this binary
representation so that we don't have to parse it the next time.
The memory usage is not really fixable, but how bad is it? The paragraph-
properties is one of the biggest property sets. It has 53 defined properties in
ODF 1.2, comprising almost exactly 1 KB. Note that this is how many attributes
that are defined, i.e. the max number of actual attributes. Most actual styles
will have far fewer. The values will generally be shorter than the names so
let's assume that with overhead we have 2 KB, give or take a few bytes. Other
estimations are that we will have perhaps 50 styles at most in a largish
document with on average 3 property sets. So this will be 2 * 50 * 3 = 300 KB,
where the content can easily be 10s of MB. Not that a big deal. And if it does
become a problem we can always find a more compact storage representation than
QString while still keep the API.
So here is a summary of what I want to do:
* Represent styles as text attributes wherever possible so that they can be
complete, i.e. not throw away style data.
* Implement a binary interface for some (in the long run all) attributes so
that we retain the advantages of that (mostly speed).
* Because all our apps use DOM parsing now (even if stream reading is used
under the hood) and many filters use stream parsing, I want to provide both
variants of the loadOdf() method.
* I want these style objects to be in libs/odf. In the long run, the current
ones in kotext/styles should also move there but those are pretty complex and
we need to be careful so that we don't introduce bugs.
In the short run I just want to avoid duplication and all patches will of
course go through review.
I hope that this is what you were looking for.
-Inge
> A nice weekend,
>
> Thorsten
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20140428/60e8a0c5/attachment.htm>
More information about the calligra-devel
mailing list