New parser branch (Was: Dumping the source DOM?)

Wed Jul 13 14:41:06 UTC 2005

On Wednesday 13 July 2005 13:55, Roberto Raggi wrote:
> On Wednesday 13 July 2005 13:43, Sylvain Joyeux wrote:
> > Do you think it would be possible, though, to use gcc-xml (or whatever
> > "static" parser) to parse external dependencies and build persistent
> > datastores ? It would workaround the problems you have to parse C++ in
> > real-time (which is far from being simple) when advanced functionalities
> > are not needed.
>
> it's not about parsing. We already have it. It is about store symbols.
> Think about it. You have your C++ source file parsed.. and now? well now
> you have to store the result of the parser in a suitable form for code
> completion and class browsing(and quick lookup).. I hope you're not
> thinking to use XML for that. The first version of my parser was using XML
> as intermediate representation(3 years ago).. and was stupid and slow. So I
> wrote Catalog and CodeModel. What we should do is to improve Catalog and
> CodeModel and add things like templates, operators, local scope, etc.
Ashley Winters is suggesting using an xml translation unit dump for the next 
version of the Smoke bindings library. And he is also suggesting doing the 
runtime introspection via xpath, on xml files as text inside .so libraries, 
one per class. Hmm, he has done some sizing and performance testing and it 
didn't come out too badly.

You lose any comments with a translation unit dump, and they are needed for 
both bindings and IDEs. Also you don't know which include file was associated 
with which class, which is needed if you are going to generate .cpp code for 
a language binding.

We had a discussion about this sort of thing at the Kiev conference. I've 
started work on using the bison grammar that is part of ruby for the next 
KDevelop 3.3 class browser instead of regular expressions. The only 
definition of the ruby grammar is the bison grammar, and it wouldn't be very 
easy to go straight to using Roberto's parser generator. So I do the bison 
first, and then the new LL(1) grammar for KDE 4/Qt 4.

I had these questions to ask him about how the top down recursive descent 
parser compared with a bottom up ruby one:

- Speed?
  Roberto has measured his parser and it is faster than bison.

- Ease of use?
  It has nearly all the features of bison except associativity and precedence 
hints in the grammar. You can't use left recursion, and so for ruby a small 
number of grammar rules about lists of method arguments would need to be 
changed.

- Error recovery?
  Apparently easier with the new LL(1) generator. Although bison seems fine to 
me and I will just have to add a few more 'error' rules to skip to the next 
valid  token. But top down parsers have a 'better idea' of what their 
currently doing, than bottom up ones.

- Language independent?
  The parser generator and refactoring engine will be language independent, 
and I would like to use ruby as a test case to ensure they are. Roberto is 
keen on Java and I think he will ensure their is nothing C++ specific that 
won't work with a Java parser. It would be nice to have an access type of 
'package' as well as the usual 'private', 'protected' and 'public' in the 
language independent parts.

-- Richard