kdev-pg: lookahead implementation

Tue Feb 7 14:50:11 UTC 2006

Hi Jakob!

On Monday 06 February 2006 12:40, Jakob Petsovits wrote:
>
> On the implementation side, I still have not grasped the exact difference
> between lookahead and backtracking. If I employ a technique like in my
> java_lookahead helper class where the class does regular parsing with the
> only difference that the token is not consumed, is that backtracking or
> lookahead?
...

I'll try to explain my point of view. Don't take it too seriosly I will 
simplify a bit.

java_lookhead is an hack to implement backtracking. Backtracking is a *very* 
powerful tool, but it can decrease *a lot* the performance of the recognizer. 
If it's possible you should use a *fixed* set of tokens (e.g. K tokens) to 
disambiguate the rule. Sometimes you don't know exactly the length of these 
tokens(K), but you know that this set is finite. In this case you can *try* 
to parse the rule and then merge the resulting AST and advance the parser(in 
case of success) or try with another rule. Now the point is what do you want 
to do with the user *actions* (e.g. the C++ code, the conditions, and so 
on..).. well it depends. In general you don't want to execute the user 
actions because they can be very expensive. But, the real problem is that 
they can change the current state and make very hard to recover from an 
error. For instance, suppose you have a user action that will add a symbol 
into the symbol table.. well in this case you have to remove the symbol from 
the symbol table if you have a failure (in general this not easy and for sure 
it's not cheap).

Note that a lot of rules are not LL(K) even if K is infinite!! When you want 
to parse something that is not LL(K) you may want to delay the parse as much 
as possible. For instance Java. It is pretty common to parse a Java 
translation unit in two steps:
 - step1: parse the top level declarations + skip method bodies
 - step2: build the symbol table + parse the method bodies

step2 will use attributes computed in step1 + the symbol table. So for 
instance in step 2 will be pretty easy to disambiguate variables, types, ... 

>
> Also, would such an implementation make sense for the question item (after
> all, it does lookahead-parsing with LL(1) characteristics too, and can do
> lookahead-in-lookahead) or do we want to go for a more complicated
> solution?
well I want to improve it quite a bit.

> I'd like to put all user-specified instance variables (and constructor)
> into a common superclass which is then subclassed by the parser class and
> the lookahead helper class. I think it should be possible to copy those
> variables to the lookahead class and keep the ones from the parser class
> untouched by using the assign operator. Of course, that only works if the
> variables are only values, not pointers, or if the user also provides an
> operator=() method. What do you think about that?
hmm, i don't know.. it looks a bit hacky :-) let's think a bit about it.

>
> Finally, I'd like to extend the syntax to "?[int]( items )" so that
> "?[2]( RBRACE )" would do a LA(2).kind == RBRACE check and
> "?( LBRACE )" means "?[1]( LBRACE )".
this is not necessary. We'll have LL(K) at some point. ?[int] will be 
automagically handled by kdev-pg

>
> So much for lookahead, expect my thoughts and questions on other topics
> soon, Jakob
:-)

ciao robe