kdev-pg: lookahead implementation

Mon Feb 6 12:52:04 UTC 2006

Hi Roberto, list,

We're going to have a question item which will be used to do lookaheads.
That will make rules like

   ( ?(declaration) declaration | expression ) SEMICOLON
-> statement ;;

possible, where we only parse the declaration when we really know that it's 
going to be one, otherwise we parse the expression rule. A question item can 
only hold terminals and symbols, no option items, no closures (multiplication 
items) or whatever, which simplifies all that.

On the implementation side, I still have not grasped the exact difference 
between lookahead and backtracking. If I employ a technique like in my 
java_lookahead helper class where the class does regular parsing with the 
only difference that the token is not consumed, is that backtracking or 
lookahead?

Also, would such an implementation make sense for the question item (after 
all, it does lookahead-parsing with LL(1) characteristics too, and can do 
lookahead-in-lookahead) or do we want to go for a more complicated solution?

What should we do with semantic actions (code blocks) while doing lookahead? 
After all, at least my Java grammar relies on some status variables (mainly 
ltCounter, but also tripleDotOccurred) to be in a correct state.
If we abandon semantic checks (code conditions) within lookahead itself, we're 
getting incorrect lookahead results, and if we abandon code blocks, the 
semantic checks are getting incorrect. On the other hand, if we allow code 
blocks, they could mess up the parser structures if used for anything else 
than parser states.

I'd like to put all user-specified instance variables (and constructor) into a 
common superclass which is then subclassed by the parser class and the 
lookahead helper class. I think it should be possible to copy those variables 
to the lookahead class and keep the ones from the parser class untouched by 
using the assign operator. Of course, that only works if the variables are 
only values, not pointers, or if the user also provides an operator=() 
method. What do you think about that?

Finally, I'd like to extend the syntax to "?[int]( items )" so that
"?[2]( RBRACE )" would do a LA(2).kind == RBRACE check and
"?( LBRACE )" means "?[1]( LBRACE )".

So much for lookahead, expect my thoughts and questions on other topics soon,
  Jakob