On Memory consumption of KDevelop-PG

Sat Dec 19 17:44:22 UTC 2009

On Saturday 19 December 2009 10:49:47 Andreas Pakulat wrote:
> On 19.12.09 03:50:01, Milian Wolff wrote:
> > I did a massif run with duchainify on Mediawiki, and well as far as I can
> > see there is no apparent memory leak.
> >
> > http://mwolff.pastebin.com/f3581a8d6
> >
> > What bugs me is the peak... See also:
> > http://mwolff.pastebin.com/f4a66d631
> >
> > I mean, it's pretty clear why we have a pretty high memory consumption:
> >
> > - every visited node gets an Ast, even those that are just general
> > helpers (AST == Tree.. duh)
> 
> IMHO its no big deal if the parser needs some 200M for parsing a single
> file as long as afterwards the memory is free again for re-use. In
> particular it must not be fragmented (which should already work due to
> usage of the memory pool). For how to make the AST smaller see further
> down on your "compress the AST" question.

Hm.

> Also, this is a parser generator, so it has to find a balance between
> being able to parse certain constructs and being efficient. You can
> always take the generated parser and stop generating it and consider it
> hand-written. So you could adjust it to your needs...

That's too early for us imo. And given that PHP is still in development and 
new features are added from year to year, I doubt if we ever want to go that 
route. Esp. considering that we still don't support PHP 5.3 specific syntax 
statements. Also PHP 6.0 will likely be released in the next year(s).

> > - is the ducontext pointer on _every_ node really neccessary? Imo it only
> > makes sense for functions, classes and top-statements... At least the
> > parts in PHP where we use them can be changed to pass a currentContext(),
> > instead of using the ast-member.
> 
> No most probably this is not needed. However its a bit more cumbersome
> to get members into specific AST nodes only IIRC.

No, not at all, see:
http://techbase.kde.org/Development/KDevelop-PG-
Qt_Introduction#defining_additional_variables_for_the_parse_tree

> > - do we really want to support gigantic source files, or why do we make
> > startToken & endToken a qint64? Just making them int would save us 8
> > Bytes, i.e. 20%
> 
> Hmm, I guess qint32 would be quite enough for a single file, I guess
> even qint26 (if it would exist) would be enough. The original kdev-pg
> uses size_t fwiw.

On my 64 bit machine size_t == qint64 and int == qint32.

I still think we should change that to int, what do you guys think?  I'll re-
run my massif test, I can't believe that the difference was so small 
yesterday...

> > - finally, maybe the hardest part: Can't we compress the AST _while
> > building it_, i.e. can we somehow drop all these useless
> > logicalAExpression -> logicalBExpression -> logicalCExpression -> ... ?
> > Imo we should at least make that one node and work with members (i.e.:
> > logicalExpression->type == LOGICAL_XOR) and than a ptr to either a "real"
> > (not-logical) expression or to a nested logical expression... Maybe that
> > could save us some more MB...
> 
> As I said above, taking some extra memory during the parsing process is
> ok if you can properly free it again. So what might be a valuable
> approach is to either drop the AST stuff completely (AFAIK C++ does
> that) or generate a compact AST. I'm doing that for QMake support
> (though more for the "easy to work with" reason that memory consumption)
> and it was mostly "dumb typing" work after having defined an compact
> AST.

I remember that, you once told me that already. I'll have to look at the QMake 
sources. But to clarify: You are still using a parser generator? As I said 
above, I don't want to remove that.

-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20091219/276f6b42/attachment.sig>