r++
Roberto Raggi
roberto at kdevelop.org
Thu Aug 25 10:55:05 UTC 2005
Hi Steven,
On Thursday 25 August 2005 06:18, Steven T. Hatton wrote:
> My question is this: do any of the nodes in the parse tree formally
> represent the concrete tokens in the source code being parsed? Can these
> be easily identified? Is it simply a matter of determining if they are
> terminals?
I'll try to explain a bit the design of the KDevelop C++ parser. Maybe it can
help you and other developers to contribute to it.
- the file-stream contains `a sequence' of *characters*.. so you have the
character at position(offset?) `N', `N+1', and so on...
- the token-stream contains `a sequence' of tokens. A token is nothing more
than a "slice" of the token-stream decorated with a `token-type'. You can
think about a token as a pair<start-position, end-position> with
start-position, end-position in file-stream
- the Abstract Syntax Tree is `a tree' created on top of the token-stream.
An AST node contains *two* very important fields. `start_token' and
`end-token'(more or less like a `token' contains `start-position' and
`end-position'). So an AST node is nothing more than `a sequence' of tokens.
But of course an AST node contains child-nodes. A `child' node has a very
*important* property.
[*] the position(<start-token, end-token>) of a `child' node is always
contained in the position of the parent node.
For example, we can take a look at the `IncrDecrExpressionAST' it represent an
increment/decrement expression. The AST node looks like
struct IncrDecrExpressionAST: public ExpressionAST
{
DECLARE_AST_NODE(IncrDecrExpression)
// std::size_t start_token, end_token; ### inherithed from AST
std::size_t op;
ExpressionAST *expression;
};
The `IncrDecrExpressionAST' contains a reference to the `incr/decr' token(the
field `op'). It also contains a reference to another `expression'. The
position of `op' and `expression' are contained in the position of
`IncrDecrExpressionAST'.
I think this makes the things a bit more complicated for `terminals'. Because
the AST is *typed*. And you need a reference to the `IncrDecrExpression' if
you want knows its `terminals' or its child-nodes. So you *forced* to
reimplement the method DefaultVisitor::visitIncrDecrExpressionAST(..)
ciao robe
PS: thanks for question! while checking the output of your xml-dumper I found
a two bugs in r++. One in the code that set the position for tokens; the
other in the code that generate the AST for postfix expressions :-) I'll fix
it tomorrow.
More information about the KDevelop-devel
mailing list