What is going on in language part land

Adam Treat treat at kde.org
Sat Aug 5 16:18:56 UTC 2006


On Saturday 05 August 2006 8:01 am, Jakob Petsovits wrote:
> On Friday, 4. August 2006 19:00, Adam Treat wrote:
> > Hi All,
> >
> > I just wanted to layout a summary of discussions that are going on with
> > the parsers.  Roberto was recently in the #kdevelop IRC channel and a
> > number of us have been talking via email about how to proceed with the
> > various parsers.
> >
> > The current situation:
> > 1. kdevelop-pg generated C# and Java parsers that do not have a
> > codemodel. 2. Hand written C++ parser that does have a codemodel, but one
> > that is lacking.
> >
> > Because of the difficulty in coding a codemodel for every language parser
> > by hand, the thought was to see if kdevelop-pg could be amended to
> > generate them.
>
> For reference, I'm currently working on a generator (about halfway done)
> that produces codemodels like the current C++ one. I decided to get an
> exact replication of the C++ codemodel, because I've got no idea how the
> improved one should look like. But once it's there, we can easily change it
> to fit our new needs. (Like, referencing the AST instead of storing stuff
> by itself.)

This might be the way to go.  Just keep the AST as the underlying datastore 
for the codemodel.  The codemodel will just become a bunch of convenience 
functions for developers to manipulate and query the AST.  I like it.

> > Roberto has discussed this in a number of ways.  I think the summary is
> > that a codemodel is used in addition to the AST for three reasons:
> >
> > 1.  Performance and memory usage.  The AST can be resource hungry and
> > memory intensive.
> > 2.  The AST does not contain *scope* and *type* information.  The
> > codemodel does.
> > 3.  The codemodel's API makes more sense to developers and can be easier
> > to use and manipulate.
> >
> > I made a suggestion that perhaps storing the AST *wouldn't* be such a
> > huge burden in terms of memory.  If this is so, then perhaps it makes
> > sense to put aside #1 and see what we can do about #2 and #3.
> >
> > #2. is the real bear to me.  I don't know what would be involved with
> > modifying kdevelop-pg to include scope and type information.  I also
> > don't know how it would affect the DUChain that Hamish has been working
> > on.
>
> I guess it would be possible to modify kdevelop-pg for including scope and
> type information, but I would like a more detailed definition of what that
> information essentially is.

I think Hamish and Roberto might answer this one better.

> For scopes, I could imagine that it should be possible to access the parent
> scope from any (deeply nested) AST member further below. Maybe with the
> scope AST items containing an additional compulsory "name" field and a list
> of child scopes. Would that be it?
>
> For type information, I have no idea what's needed in addition to what's
> already in the AST. Well, a toString() method maybe, and an equality
> operator. What would you define as "type information"?
>
> > #3. is also a bit of a mystery.  Perhaps we can write some convenience
> > functions that would abstract the esoteric parts of the AST, but still
> > use the AST as the datastore, rather than copying that information into
> > another structure like we do with the codemodel.
>
> Agreed.
>
> > Anyway, if we _can_ solve these problems then I think we should.  Hand
> > coding a codemodel for each language part just increases the amount of
> > work for an already beleagued group of maintainers.
>
> Even the current codemodel is a big pile of code monkey work.
> The way it looks now, it seems that one codemodel definition file (for my
> new codemodel generator) with a little more than 300 lines can nearly
> exactly generate the existing C++ codemodel with 3 files of 700, 900 and 80
> LOC (approximately). That seems like an improvement even if we wouldn't
> change all the codemodel stuff.

Great news!

> > Another thing that I want to keep an eye on is Roberto's suggestion that
> > we should think about writing a C++ grammar file for kdevelop-pg.  If all
> > of the parsers, including C++, could be using the same generator, well
> > that'd be a real boon.
> >
> > However, we have no volunteers for this and it would likely be a
> > difficult task.  Roberto seems to think that kdevelop-pg is in a state
> > that could handle it though.  It is good to keep in mind.
>
> Hm, ...let's see:
> * We have a pre-processor and a lexer, neither of which needs to be
> replaced * We have a parser that uses just the same paradigms and solutions
> that kdevelop-pg also uses (er, ...why is that? ;)
> * The parser is complete, works, and just needs to be transcribed from
>   manually-written C++ to its kdevelop-pg representation.
>
> I mean, it can't be _that_ hard, right?
> Seems like it's important enough to try it out.
> (Should I do it soon? What about completing my SoC project first?)

Hah!!!  I just thought the same thing last night!  I mean the kdevelop-pg was 
architected to output the current C++ parser.  Reverse engineering it to come 
up with a grammar file probably would be a neat hack.

Mattr is right, though.  Let's keep this in mind as a worthy goal for the 
future.

> The question is rather:
> "Do we want kdevelop-pg to produce camel-cased code
> instead of c_style_underlines?"
> Otherwise the parser will look a lot more ugly than before ;)

BTW, offtopic a bit, I was wondering if we couldn't use the google CTemplate 
library for our templates.  It really is kinda cool and seems very powerful 
since templates can reference each other.

Adam




More information about the KDevelop-devel mailing list