What is going on in language part land

Sat Aug 5 12:01:23 UTC 2006

On Friday, 4. August 2006 19:00, Adam Treat wrote:
> Hi All,
>
> I just wanted to layout a summary of discussions that are going on with the
> parsers.  Roberto was recently in the #kdevelop IRC channel and a number of
> us have been talking via email about how to proceed with the various
> parsers.
>
> The current situation:
> 1. kdevelop-pg generated C# and Java parsers that do not have a codemodel.
> 2. Hand written C++ parser that does have a codemodel, but one that is
> lacking.
>
> Because of the difficulty in coding a codemodel for every language parser
> by hand, the thought was to see if kdevelop-pg could be amended to generate
> them.

For reference, I'm currently working on a generator (about halfway done) that 
produces codemodels like the current C++ one. I decided to get an exact 
replication of the C++ codemodel, because I've got no idea how the improved 
one should look like. But once it's there, we can easily change it to fit our 
new needs. (Like, referencing the AST instead of storing stuff by itself.)

> Roberto has discussed this in a number of ways.  I think the summary is
> that a codemodel is used in addition to the AST for three reasons:
>
> 1.  Performance and memory usage.  The AST can be resource hungry and
> memory intensive.
> 2.  The AST does not contain *scope* and *type* information.  The codemodel
> does.
> 3.  The codemodel's API makes more sense to developers and can be easier to
> use and manipulate.
>
> I made a suggestion that perhaps storing the AST *wouldn't* be such a huge
> burden in terms of memory.  If this is so, then perhaps it makes sense to
> put aside #1 and see what we can do about #2 and #3.
>
> #2. is the real bear to me.  I don't know what would be involved with
> modifying kdevelop-pg to include scope and type information.  I also don't
> know how it would affect the DUChain that Hamish has been working on.

I guess it would be possible to modify kdevelop-pg for including scope and 
type information, but I would like a more detailed definition of what that 
information essentially is.

For scopes, I could imagine that it should be possible to access the parent 
scope from any (deeply nested) AST member further below. Maybe with the
scope AST items containing an additional compulsory "name" field and a list of 
child scopes. Would that be it?

For type information, I have no idea what's needed in addition to what's 
already in the AST. Well, a toString() method maybe, and an equality 
operator. What would you define as "type information"?

> #3. is also a bit of a mystery.  Perhaps we can write some convenience
> functions that would abstract the esoteric parts of the AST, but still use
> the AST as the datastore, rather than copying that information into another
> structure like we do with the codemodel.

Agreed.

> Anyway, if we _can_ solve these problems then I think we should.  Hand
> coding a codemodel for each language part just increases the amount of work
> for an already beleagued group of maintainers.

Even the current codemodel is a big pile of code monkey work.
The way it looks now, it seems that one codemodel definition file (for my new 
codemodel generator) with a little more than 300 lines can nearly exactly 
generate the existing C++ codemodel with 3 files of 700, 900 and 80 LOC 
(approximately). That seems like an improvement even if we wouldn't change 
all the codemodel stuff.

> Another thing that I want to keep an eye on is Roberto's suggestion that we
> should think about writing a C++ grammar file for kdevelop-pg.  If all of
> the parsers, including C++, could be using the same generator, well that'd
> be a real boon.
>
> However, we have no volunteers for this and it would likely be a difficult
> task.  Roberto seems to think that kdevelop-pg is in a state that could
> handle it though.  It is good to keep in mind.

Hm, ...let's see:
* We have a pre-processor and a lexer, neither of which needs to be replaced
* We have a parser that uses just the same paradigms and solutions that
  kdevelop-pg also uses (er, ...why is that? ;)
* The parser is complete, works, and just needs to be transcribed from
  manually-written C++ to its kdevelop-pg representation.

I mean, it can't be _that_ hard, right?
Seems like it's important enough to try it out.
(Should I do it soon? What about completing my SoC project first?)

The question is rather:
"Do we want kdevelop-pg to produce camel-cased code
instead of c_style_underlines?"
Otherwise the parser will look a lot more ugly than before ;)

Have a great time,
  Jakob