Double^W Quadruple speed parsing of binary MS Office files
Sebastian Sauer
mail at dipe.org
Thu Jun 16 01:24:57 BST 2011
On Tuesday 14 June 2011 09:19:20 Jos van den Oever wrote:
> On Monday, June 13, 2011 19:02:09 PM Jos van den Oever wrote:
> > When run on a set of 600 ppt files from a.o. kofficetests, this is the
> > output from valgrind:
> > simpletest: (normal run time: 5.7 seconds)
> > ==28930== total heap usage: 2,457,961 allocs, 2,457,954 frees,
> > 218,241,950 bytes allocated
> > apitest: (normal run time: 2.9 seconds)
> > ==28852== total heap usage: 254,832 allocs, 254,825 frees, 52,421,077
> > bytes allocated
>
> The speed for apitest is now down to 1.3 seconds, making the speedup 4.3x.
> The other numbers stay the same.
Impressive. Thanks for sharing.
> The current parser that Calligra uses, uses QSharedPointer, QList, QVector
> and QByteArray. api.h does not use any of these.
In the MSWord-filter we do;
QBuffer buffer;
QByteArray array;
array.resize(stream.size());
unsigned long r = stream.read((unsigned char*)array.data(), stream.size());
buffer.setData(array);
LEInputStream wdstm(&buff1);
where the stream.read takes according to massif >70% of the mem during the
doc=>odt conversation. Your note above made me think if we cannot save that
allocation and operate direct on the stream...
More information about the calligra-devel
mailing list