On Memory consumption of KDevelop-PG

Sat Dec 19 09:49:47 UTC 2009

On 19.12.09 03:50:01, Milian Wolff wrote:
> I did a massif run with duchainify on Mediawiki, and well as far as I can see 
> there is no apparent memory leak.
> 
> http://mwolff.pastebin.com/f3581a8d6
> 
> What bugs me is the peak... See also:
> http://mwolff.pastebin.com/f4a66d631
> 
> I mean, it's pretty clear why we have a pretty high memory consumption:
> 
> - every visited node gets an Ast, even those that are just general helpers 
> (AST == Tree.. duh)

IMHO its no big deal if the parser needs some 200M for parsing a single
file as long as afterwards the memory is free again for re-use. In
particular it must not be fragmented (which should already work due to
usage of the memory pool). For how to make the AST smaller see further
down on your "compress the AST" question.

Also, this is a parser generator, so it has to find a balance between
being able to parse certain constructs and being efficient. You can
always take the generated parser and stop generating it and consider it
hand-written. So you could adjust it to your needs...

> - is the ducontext pointer on _every_ node really neccessary? Imo it only 
> makes sense for functions, classes and top-statements... At least the parts in 
> PHP where we use them can be changed to pass a currentContext(), instead of 
> using the ast-member.

No most probably this is not needed. However its a bit more cumbersome
to get members into specific AST nodes only IIRC.

> - do we really want to support gigantic source files, or why do we make 
> startToken & endToken a qint64? Just making them int would save us 8 Bytes, 
> i.e. 20%

Hmm, I guess qint32 would be quite enough for a single file, I guess
even qint26 (if it would exist) would be enough. The original kdev-pg
uses size_t fwiw.

> - finally, maybe the hardest part: Can't we compress the AST _while building 
> it_, i.e. can we somehow drop all these useless logicalAExpression -> 
> logicalBExpression -> logicalCExpression -> ... ? Imo we should at least make 
> that one node and work with members (i.e.: logicalExpression->type == 
> LOGICAL_XOR) and than a ptr to either a "real" (not-logical) expression or to 
> a nested logical expression... Maybe that could save us some more MB...

As I said above, taking some extra memory during the parsing process is
ok if you can properly free it again. So what might be a valuable
approach is to either drop the AST stuff completely (AFAIK C++ does
that) or generate a compact AST. I'm doing that for QMake support
(though more for the "easy to work with" reason that memory consumption)
and it was mostly "dumb typing" work after having defined an compact
AST.

Just some thoughts from me :)

Andreas

-- 
After your lover has gone you will still have PEANUT BUTTER!