r++

Roberto Raggi roberto at kdevelop.org
Thu Aug 25 10:55:05 UTC 2005


Hi Steven,

On Thursday 25 August 2005 06:18, Steven T. Hatton wrote:
> My question is this: do any of the nodes in the parse tree formally
> represent the concrete tokens in the source code being parsed?  Can these
> be easily identified?  Is it simply a matter of determining if they are
> terminals?

I'll try to explain a bit the design of the KDevelop C++ parser. Maybe it can 
help you and other developers to contribute to it. 

  - the file-stream contains `a sequence' of *characters*.. so you have the 
character at position(offset?) `N', `N+1', and so on...

  - the token-stream contains `a sequence' of tokens. A token is nothing more 
than a "slice" of the token-stream decorated with a `token-type'. You can 
think about a token as a pair<start-position, end-position> with 
start-position, end-position in file-stream

  - the Abstract Syntax Tree is `a tree' created on top of the token-stream. 
An AST node contains *two* very important fields. `start_token' and 
`end-token'(more or less like a `token' contains `start-position' and 
`end-position'). So an AST node is nothing more than `a sequence' of tokens. 
But of course an AST node contains child-nodes. A `child' node has a very 
*important* property. 
  
 [*] the position(<start-token, end-token>) of a `child' node is always 
contained in the position of the parent node. 

For example, we can take a look at the `IncrDecrExpressionAST' it represent an 
increment/decrement expression. The AST node looks like

struct IncrDecrExpressionAST: public ExpressionAST
{
  DECLARE_AST_NODE(IncrDecrExpression)

  // std::size_t start_token, end_token;  ### inherithed from AST

  std::size_t op;
  ExpressionAST *expression;
};

The `IncrDecrExpressionAST' contains a reference to the `incr/decr' token(the 
field `op'). It also contains a reference to another `expression'. The 
position of `op' and `expression' are contained in the position of 
`IncrDecrExpressionAST'.

I think this makes the things a bit more complicated for `terminals'. Because 
the AST is *typed*. And you need a reference to the `IncrDecrExpression' if 
you want knows its `terminals' or its child-nodes. So you *forced* to 
reimplement the method DefaultVisitor::visitIncrDecrExpressionAST(..)

ciao robe

PS: thanks for question! while checking the output of your xml-dumper I found 
a two bugs in r++. One in the code that set the position for tokens; the 
other in the code that generate the AST for postfix expressions :-) I'll fix 
it tomorrow.






More information about the KDevelop-devel mailing list