What is going on in language part land

Sat Aug 5 16:18:56 UTC 2006

On Saturday 05 August 2006 8:01 am, Jakob Petsovits wrote:
> On Friday, 4. August 2006 19:00, Adam Treat wrote:
> > Hi All,
> >
> > I just wanted to layout a summary of discussions that are going on with
> > the parsers.  Roberto was recently in the #kdevelop IRC channel and a
> > number of us have been talking via email about how to proceed with the
> > various parsers.
> >
> > The current situation:
> > 1. kdevelop-pg generated C# and Java parsers that do not have a
> > codemodel. 2. Hand written C++ parser that does have a codemodel, but one
> > that is lacking.
> >
> > Because of the difficulty in coding a codemodel for every language parser
> > by hand, the thought was to see if kdevelop-pg could be amended to
> > generate them.
>
> For reference, I'm currently working on a generator (about halfway done)
> that produces codemodels like the current C++ one. I decided to get an
> exact replication of the C++ codemodel, because I've got no idea how the
> improved one should look like. But once it's there, we can easily change it
> to fit our new needs. (Like, referencing the AST instead of storing stuff
> by itself.)

This might be the way to go.  Just keep the AST as the underlying datastore 
for the codemodel.  The codemodel will just become a bunch of convenience 
functions for developers to manipulate and query the AST.  I like it.

> > Roberto has discussed this in a number of ways.  I think the summary is
> > that a codemodel is used in addition to the AST for three reasons:
> >
> > 1.  Performance and memory usage.  The AST can be resource hungry and
> > memory intensive.
> > 2.  The AST does not contain *scope* and *type* information.  The
> > codemodel does.
> > 3.  The codemodel's API makes more sense to developers and can be easier
> > to use and manipulate.
> >
> > I made a suggestion that perhaps storing the AST *wouldn't* be such a
> > huge burden in terms of memory.  If this is so, then perhaps it makes
> > sense to put aside #1 and see what we can do about #2 and #3.
> >
> > #2. is the real bear to me.  I don't know what would be involved with
> > modifying kdevelop-pg to include scope and type information.  I also
> > don't know how it would affect the DUChain that Hamish has been working
> > on.
>
> I guess it would be possible to modify kdevelop-pg for including scope and
> type information, but I would like a more detailed definition of what that
> information essentially is.

I think Hamish and Roberto might answer this one better.

> For scopes, I could imagine that it should be possible to access the parent
> scope from any (deeply nested) AST member further below. Maybe with the
> scope AST items containing an additional compulsory "name" field and a list
> of child scopes. Would that be it?
>
> For type information, I have no idea what's needed in addition to what's
> already in the AST. Well, a toString() method maybe, and an equality
> operator. What would you define as "type information"?
>
> > #3. is also a bit of a mystery.  Perhaps we can write some convenience
> > functions that would abstract the esoteric parts of the AST, but still
> > use the AST as the datastore, rather than copying that information into
> > another structure like we do with the codemodel.
>
> Agreed.
>
> > Anyway, if we _can_ solve these problems then I think we should.  Hand
> > coding a codemodel for each language part just increases the amount of
> > work for an already beleagued group of maintainers.
>
> Even the current codemodel is a big pile of code monkey work.
> The way it looks now, it seems that one codemodel definition file (for my
> new codemodel generator) with a little more than 300 lines can nearly
> exactly generate the existing C++ codemodel with 3 files of 700, 900 and 80
> LOC (approximately). That seems like an improvement even if we wouldn't
> change all the codemodel stuff.

Great news!

> > Another thing that I want to keep an eye on is Roberto's suggestion that
> > we should think about writing a C++ grammar file for kdevelop-pg.  If all
> > of the parsers, including C++, could be using the same generator, well
> > that'd be a real boon.
> >
> > However, we have no volunteers for this and it would likely be a
> > difficult task.  Roberto seems to think that kdevelop-pg is in a state
> > that could handle it though.  It is good to keep in mind.
>
> Hm, ...let's see:
> * We have a pre-processor and a lexer, neither of which needs to be
> replaced * We have a parser that uses just the same paradigms and solutions
> that kdevelop-pg also uses (er, ...why is that? ;)
> * The parser is complete, works, and just needs to be transcribed from
>   manually-written C++ to its kdevelop-pg representation.
>
> I mean, it can't be _that_ hard, right?
> Seems like it's important enough to try it out.
> (Should I do it soon? What about completing my SoC project first?)

Hah!!!  I just thought the same thing last night!  I mean the kdevelop-pg was 
architected to output the current C++ parser.  Reverse engineering it to come 
up with a grammar file probably would be a neat hack.

Mattr is right, though.  Let's keep this in mind as a worthy goal for the 
future.

> The question is rather:
> "Do we want kdevelop-pg to produce camel-cased code
> instead of c_style_underlines?"
> Otherwise the parser will look a lot more ugly than before ;)

BTW, offtopic a bit, I was wondering if we couldn't use the google CTemplate 
library for our templates.  It really is kinda cool and seems very powerful 
since templates can reference each other.

Adam