more code completion fun!

Daniel Berlin dberlin at redhat.com
Tue Mar 13 16:58:37 GMT 2001


Heiko Leberer <Heiko_Leberer at non.agilent.com> writes:

> Daniel Berlin wrote:
> > 
> > Heiko Leberer <Heiko_Leberer at non.agilent.com> writes:
> > 
> > > christopher j bottaro wrote:
> > > >
> > >
> > > I'm not familiar with ANTLR, but since it looks quite like to yacc with
> > > some additional features, so I comment on what I think you're doing.
> > 
> > It's not, at all.
> > YACC is LALR(1), ANTLR is LL(k).
> > 
> 
> Not knowing ANTLR, I must admit, that I assumed the lr in ANTLR is a
> hint for lalr.

ANTLR is also known as PCCTS 2.0, if it helps. (PCCTS was Terrence
Parr's  first go at applying his Phd thesis work on linear approximate
lookahead to a real generator)

> 
> > There is a huge difference in how you write a grammar for an LALR(1)
> > generator, and how you write one for an LL(k) generator
> > >
> > > Those ()? and ()* are great to save some place in writing, but
> > > apparently they do not lead to carefully designed rules.
> > >
> > > How should the parser generator differ between
> > >
> > > "++ident" as result from "unary_op unary_op postfix_expr"
> > >
> > > and
> > >
> > > "++ident" as result from "INC postfix_expr"
> > 
> > Through the k lookahead.
> 
> Then why the non-determinsms?
Because it's approximating.
Computing LL(k) is, of course,  too expensive (exponential).
ANTLR is using linear approximate lookahead to do it in linear time
(by collapsing the billion input sequences into k sets of tokens),
which means that while it's not going to be non-deterministic in
practice, ANTLR doesn't know that.

http://www.jguru.com/jguru/faq/view.jsp?EID=264825
> 
> I think you're confusing something here. The C/C++ grammar allows
> generation of both rules. 

I meant the k lookahead in the LL(k) Lexer (ANTLR lexers aren't like
flex, they are LL(k) as well, mainly because there's no reason for
them to be different)

> Both are *legal*. There is no syntactical way to differ between
> them,

Sure there is , you aren't thinking right.

"++" is never going to be interpreted by the lexer as "unary
expression unary expression". It'll see "+", you'll look ahead with a
predicate, see "+" again, and turn it into a single "INC" token.

Or you just specify INC as "++", and unless you have a double plus, it
won't be turned into an INC token.

There is no non-determinisim here with ANTLR, assuming you do it
right. Without a syntactic predicate, or checking the words with a
hashtable (as ANTLR does before returning the token), of course, yer
screwed, unless you take advantage of some implementation detail like
it always  choosing the longest match .

> you need a semantical analysis to choose one or you just need to
> formulate your parsing grammar in a way that you get your preferred
>one.
No, you just need sufficient lookahead, and predicates.

> Semantic analysis is the slow but correct (or more correct) solution for
> the conflicts, a solution that would be used if you need to generate
> code. A change in your grammer to solve ambiguities, is the faster
> solution. This can change the meaning of a rule though, since the
> semantic is disregarded, but it would be sufficient for parsing code to
> provide a code completion feature.
> 
> If you think you can solve this because you have a LL(k) generator,
> you're wrong. C/C++ is *not* LL(k).
Depends on how you look at it.
Without considering disambiguation rules (IE just the straight
grammar), yes it is. 
It is easily parsable with LL(k) with backtracking (IE syntactic predicates)
Look at OpenC++.

When you take into account the stupid rules Bjarne added for
disambiguating, it's not even a CFG, and thus, neither LALR or LL,
because it's not in any way context-free.

FYI, someone already came up with a complete GNU c parser and
translator for ANTLR (IE including all the attributes and extensions).
It's done the right way, which is a GNU C grammar subclass'ed from a
standard c grammar. It includes a tree parser, too
http://www.antlr.org/grammars/cgram

I think this pretty much shows you can parse/translate/etc C with an
LL(k) parser.

--Dan

-
to unsubscribe from this list send an email to kdevelop-request at kdevelop.org with the following body:
unsubscribe »your-email-address«



More information about the KDevelop mailing list