more code completion fun!

Wed Mar 14 18:09:18 GMT 2001

Daniel Berlin wrote:
> 
> 
> ANTLR is also known as PCCTS 2.0, if it helps. (PCCTS was Terrence
> Parr's  first go at applying his Phd thesis work on linear approximate
> lookahead to a real generator)
> 

At least it sounds intersting. Hopefully I will find some time to check
it, unfortunately this won't be any time soon.

> >
> > > There is a huge difference in how you write a grammar for an LALR(1)
> > > generator, and how you write one for an LL(k) generator
> > > >
> > > > Those ()? and ()* are great to save some place in writing, but
> > > > apparently they do not lead to carefully designed rules.
> > > >
> > > > How should the parser generator differ between
> > > >
> > > > "++ident" as result from "unary_op unary_op postfix_expr"
> > > >
> > > > and
> > > >
> > > > "++ident" as result from "INC postfix_expr"
> > >
> > > Through the k lookahead.
> >
> > Then why the non-determinsms?
> Because it's approximating.
> Computing LL(k) is, of course,  too expensive (exponential).
> ANTLR is using linear approximate lookahead to do it in linear time
> (by collapsing the billion input sequences into k sets of tokens),
> which means that while it's not going to be non-deterministic in
> practice, ANTLR doesn't know that.
> 
> http://www.jguru.com/jguru/faq/view.jsp?EID=264825
> >
> > I think you're confusing something here. The C/C++ grammar allows
> > generation of both rules.
> 
> I meant the k lookahead in the LL(k) Lexer (ANTLR lexers aren't like
> flex, they are LL(k) as well, mainly because there's no reason for
> them to be different)
> 
> > Both are *legal*. There is no syntactical way to differ between
> > them,
> 
> Sure there is , you aren't thinking right.

Oh Jesus, who told you? Is it already in the news papers? Well, indeed,
the "no thinking right" part happens from time to time.

Seriously, no, there is no way to do this, to syntactically decide,
which derivation the user had in mind, if an ambiguity arises.

You[generic] can formulate the grammar rules in a way, that the
resulting parser is preferring one possible derivation over the other.
But that is you, that adds semantic to the grammar to choose one rule
over the other.
And here we have the loop to my original comment: The person formulating
the rules can choose how to solve ambiguities. He can either adjust the
lookahead, as you mentioned, or reformulate the rules, where possible,
to get rid of the non-determinisms, as I mentioned.
That was my advice you responded to, to formulate the rules more
carefully.

> "++" is never going to be interpreted by the lexer as "unary
> expression unary expression". It'll see "+", you'll look ahead with a
> predicate, see "+" again, and turn it into a single "INC" token.

Yes, but I could have meant it as "unary_expression unary_expression".
Therefore returning the "INC" token would be wrong. An example, where
even the LL(K) lexer will fail:

"a= b+++c;" is a legal statement in C, but there is no chance to
syntactically decide, whether the user meant "a= (b++)+c;" or "a=
b+(++c);" or "a= b+(+(+c));".

"a= b+++++c;" is legal as well, meaning e.g. "(b++)+(++c)".

Your LL(k) lexer will return  "a = b INC INC + c ;", which leads to an
error, because (b++) is not an lvalue and cannot be incremented.

As you see, if you're adjusting the lookahead, you're just preferring
one over the other. That isn't necessarily the correct rule.

Same applies to the lines I snipped. Your point is valid, that you
resolve ambiguities by adjusting the lookahead, but so is mine, that
this is weighting rules and therefore adding semantic and not a correct
syntactical solution, since there can be no for C/C++.

.... 
> Depends on how you look at it.
> Without considering disambiguation rules (IE just the straight
> grammar), yes it is.
> It is easily parsable with LL(k) with backtracking (IE syntactic predicates)
> Look at OpenC++.

Reading that, I recognized we had been talking at cross purposes.
Between "parsable" and "parsing it in the way the user meant it", there
is a - hmmm - difference, isn't it?

> 
> When you take into account the stupid rules Bjarne added for
> disambiguating, it's not even a CFG, and thus, neither LALR or LL,
> because it's not in any way context-free.
> 
> FYI, someone already came up with a complete GNU c parser and
> translator for ANTLR (IE including all the attributes and extensions).
> It's done the right way, which is a GNU C grammar subclass'ed from a
> standard c grammar. It includes a tree parser, too
> http://www.antlr.org/grammars/cgram

I think I definitely should have a look at ANTLR. The example
Christopher gave, was promising.

> I think this pretty much shows you can parse/translate/etc C with an
> LL(k) parser.
> 

I never doubted that.

Best regards,
Heiko

-
to unsubscribe from this list send an email to kdevelop-request at kdevelop.org with the following body:
unsubscribe »your-email-address«