What is going on in language part land: a summary!

Fri Aug 4 17:00:50 UTC 2006

Hi All,

I just wanted to layout a summary of discussions that are going on with the 
parsers.  Roberto was recently in the #kdevelop IRC channel and a number of 
us have been talking via email about how to proceed with the various parsers.

The current situation:
1. kdevelop-pg generated C# and Java parsers that do not have a codemodel.
2. Hand written C++ parser that does have a codemodel, but one that is 
lacking.

Because of the difficulty in coding a codemodel for every language parser by 
hand, the thought was to see if kdevelop-pg could be amended to generate 
them.  However, this would still leave us with a codemodel in C++ that isn't 
good enough for our needs.  The difficulty is trying to determine what 
exactly should these codemodel's look like and what they should contain.

Roberto has discussed this in a number of ways.  I think the summary is that a 
codemodel is used in addition to the AST for three reasons:

1.  Performance and memory usage.  The AST can be resource hungry and memory 
intensive.
2.  The AST does not contain *scope* and *type* information.  The codemodel 
does.
3.  The codemodel's API makes more sense to developers and can be easier to 
use and manipulate.

I made a suggestion that perhaps storing the AST *wouldn't* be such a huge 
burden in terms of memory.  If this is so, then perhaps it makes sense to put 
aside #1 and see what we can do about #2 and #3.

#2. is the real bear to me.  I don't know what would be involved with 
modifying kdevelop-pg to include scope and type information.  I also don't 
know how it would affect the DUChain that Hamish has been working on.

#3. is also a bit of a mystery.  Perhaps we can write some convenience 
functions that would abstract the esoteric parts of the AST, but still use 
the AST as the datastore, rather than copying that information into another 
structure like we do with the codemodel.

Anyway, if we _can_ solve these problems then I think we should.  Hand coding 
a codemodel for each language part just increases the amount of work for an 
already beleagued group of maintainers.

Roberto brought up the idea of storing the AST to disk to also cut down on the 
memory consumption.  The idea of a persistent AST is used in Eclipse, no?  
For storing it to disk, Roberto suggested we look at Google's Sparse:  
http://goog-sparsehash.sourceforge.net/

Another thing that I want to keep an eye on is Roberto's suggestion that we 
should think about writing a C++ grammar file for kdevelop-pg.  If all of the 
parsers, including C++, could be using the same generator, well that'd be a 
real boon.  

However, we have no volunteers for this and it would likely be a difficult 
task.  Roberto seems to think that kdevelop-pg is in a state that could 
handle it though.  It is good to keep in mind.

Adam