C++ parser
Roberto Raggi
roberto at kdevelop.org
Thu Jun 2 17:48:03 UTC 2005
Hi Hamish,
On Wednesday 01 June 2005 16:53, Hamish Rodda wrote:
> Hi Robe,
> Nice :) You're planning on implementing incremental parsing / parsing only
> of changed text, correct? The internals in Kate are certainly ready to
yeap!
> handle providing this information. It will be easy to associate ASTs with
> ranges in the text, and then when the text changes, the most specific
> changed region (smallest which encapsulates all of the changes) could
> inform either the AST or the parser of the change, eg. via a signal. The
yeah I like the idea to use the kate internals as an incremental lexer!! I'm
sure we can make it _right_ for 4.0 :-)
> parser can then attempt to quickly determine what change has occurred, and
> the vast majority of the time, it should be a simple matter for the parser
> to account for this in realtime (or near-realtime if we decide to use
> threads). It might require the use of an equivalent LocationTable class
> which uses smart cursors to stay up to date... I'm not too familiar with
> the code yet.
>
> Would it make sense to do parsing in the background, ie. for cases where a
> small change triggers a large amount parsing? Do you think making the kate
> text buffer reentrant would make sense? I'm just wary of avoiding
> occasional interference to the ui if we decide to switch to realtime
> parsing.
the best will be to try to keep the _state_ we use to compute the attributes
on the AST, as small as possible. So we can easly recover it
>
> I guess the idea of multi-pass parsing is to shorten the time the parser
> holds things up, at the expense of increased latency of AST data? Or is it
> for accuracy / speed of parsing?
yes and no :) actually a multi-pass parser is required to parse C++. Because
the C++ grammar is not LL(N) (for any N). So it's impossible to disambiguate
the input in one single pass! :(
The idea is to have a first pass that computes the AST and the symbol table
for the top-level declarations. And a second pass that compute the AST and
the local symbol table for the function-definitions. For instance
class C {
void a_method() {
(a) + (b);
}
int a, b;
};
in this case the input "(a) + (b)" can be reduced to a binary expression "E+E"
or a cast expression.. ``a'' reduced to type specifier and ``+(b)'' to a
unary expression. As you can see from the example we don't know the kind of
the ``a'' and ``b'' while parsing the function definition ``a_method'', so we
don't know if ``a'' is a typedef, a class name or variable!!!
C++ is full of things like that and with a multi-pass parser we can solve many
of these problems.
>
> I can't wait to see kdevelop4 with (and I'm imagining :) in-place code
> refactoring, code logic navigation, much improved highlighting, etc etc...
the parser already has some magics stuff for code-refactoring :-) In the next
days I'll post a couple of example to show this feature!
ciao robe
More information about the KDevelop-devel
mailing list