C++ parser

Thu Jun 2 17:48:03 UTC 2005

Hi Hamish,

On Wednesday 01 June 2005 16:53, Hamish Rodda wrote:
> Hi Robe,
> Nice :)  You're planning on implementing incremental parsing / parsing only
> of changed text, correct?  The internals in Kate are certainly ready to
yeap!
> handle providing this information.  It will be easy to associate ASTs with
> ranges in the text, and then when the text changes, the most specific
> changed region (smallest which encapsulates all of the changes) could
> inform either the AST or the parser of the change, eg. via a signal.  The
yeah I like the idea to use the kate internals as an incremental lexer!! I'm 
sure we can make it _right_ for 4.0 :-)

> parser can then attempt to quickly determine what change has occurred, and
> the vast majority of the time, it should be a simple matter for the parser
> to account for this in realtime (or near-realtime if we decide to use
> threads).  It might require the use of an equivalent LocationTable class
> which uses smart cursors to stay up to date... I'm not too familiar with
> the code yet.
>

> Would it make sense to do parsing in the background, ie. for cases where a
> small change triggers a large amount parsing?  Do you think making the kate
> text buffer reentrant would make sense?  I'm just wary of avoiding
> occasional interference to the ui if we decide to switch to realtime
> parsing.
the best will be to try to keep the _state_ we use to compute the attributes 
on the AST, as small as possible. So we can easly recover it

>
> I guess the idea of multi-pass parsing is to shorten the time the parser
> holds things up, at the expense of increased latency of AST data?  Or is it
> for accuracy / speed of parsing?

yes and no :) actually a multi-pass parser is required to parse C++. Because 
the C++ grammar is not LL(N) (for any N). So it's impossible to disambiguate 
the input in one single pass! :( 

The idea is to have a first pass that computes the AST and the symbol table 
for the top-level declarations. And a second pass that compute the AST and 
the local symbol table for the function-definitions. For instance

class C {

  void a_method() {
    (a) + (b);
  }
  int a, b;

};

in this case the input "(a) + (b)" can be reduced to a binary expression "E+E" 
or a cast expression.. ``a'' reduced to type specifier and ``+(b)'' to a 
unary expression. As you can see from the example we don't know the kind of 
the ``a'' and ``b'' while parsing the function definition ``a_method'', so we 
don't know if ``a'' is a typedef, a class name or variable!!!

C++ is full of things like that and with a multi-pass parser we can solve many 
of these problems.

>
> I can't wait to see kdevelop4 with (and I'm imagining :) in-place code
> refactoring, code logic navigation, much improved highlighting, etc etc...
the parser already has some magics stuff for code-refactoring :-) In the next 
days I'll post a couple of example to show this feature!

ciao robe