What is going on in language part land: a summary!
Adam Treat
treat at kde.org
Fri Aug 4 17:00:50 UTC 2006
Hi All,
I just wanted to layout a summary of discussions that are going on with the
parsers. Roberto was recently in the #kdevelop IRC channel and a number of
us have been talking via email about how to proceed with the various parsers.
The current situation:
1. kdevelop-pg generated C# and Java parsers that do not have a codemodel.
2. Hand written C++ parser that does have a codemodel, but one that is
lacking.
Because of the difficulty in coding a codemodel for every language parser by
hand, the thought was to see if kdevelop-pg could be amended to generate
them. However, this would still leave us with a codemodel in C++ that isn't
good enough for our needs. The difficulty is trying to determine what
exactly should these codemodel's look like and what they should contain.
Roberto has discussed this in a number of ways. I think the summary is that a
codemodel is used in addition to the AST for three reasons:
1. Performance and memory usage. The AST can be resource hungry and memory
intensive.
2. The AST does not contain *scope* and *type* information. The codemodel
does.
3. The codemodel's API makes more sense to developers and can be easier to
use and manipulate.
I made a suggestion that perhaps storing the AST *wouldn't* be such a huge
burden in terms of memory. If this is so, then perhaps it makes sense to put
aside #1 and see what we can do about #2 and #3.
#2. is the real bear to me. I don't know what would be involved with
modifying kdevelop-pg to include scope and type information. I also don't
know how it would affect the DUChain that Hamish has been working on.
#3. is also a bit of a mystery. Perhaps we can write some convenience
functions that would abstract the esoteric parts of the AST, but still use
the AST as the datastore, rather than copying that information into another
structure like we do with the codemodel.
Anyway, if we _can_ solve these problems then I think we should. Hand coding
a codemodel for each language part just increases the amount of work for an
already beleagued group of maintainers.
Roberto brought up the idea of storing the AST to disk to also cut down on the
memory consumption. The idea of a persistent AST is used in Eclipse, no?
For storing it to disk, Roberto suggested we look at Google's Sparse:
http://goog-sparsehash.sourceforge.net/
Another thing that I want to keep an eye on is Roberto's suggestion that we
should think about writing a C++ grammar file for kdevelop-pg. If all of the
parsers, including C++, could be using the same generator, well that'd be a
real boon.
However, we have no volunteers for this and it would likely be a difficult
task. Roberto seems to think that kdevelop-pg is in a state that could
handle it though. It is good to keep in mind.
Adam
More information about the KDevelop-devel
mailing list