more code completion fun!

christopher j bottaro cjb at cs.utexas.edu
Tue Mar 13 22:06:03 GMT 2001


> Assuming, that (list)* means, possible multiple instances of one of the
> rules in list:
>
> you should redefine (unary_op|INCR|DECR) postfix_expr
>
> Those ()? and ()* are great to save some place in writing, but
> apparently they do not lead to carefully designed rules.
>
> How should the parser generator differ between
>
> "++ident" as result from "unary_op unary_op postfix_expr"
> "++ident" as result from "INC postfix_expr"

the lexer will choose to send to the parser INCR as opposed to '+' '+'

> Additionally this grammar allows  constructs like "++ ++ ++ 123" from
> "INC INC INC postfix_expr).

i thought this was legal in c++ although behavior of such an expression is 
"undefined".

> "5.3" can be the result of
>   postfix_expr -> primary_expr -> INTEGER (DOT INTEGER)
>
> or the result of
>   postfix_expr -> primary_expr ()?((DOT) postfix_expr)? -> INTEGER
> ()?((DOT) INTEGER)

good call, can't believe i missed that.

> Christopher, this really isn't meant as an insult, but the ease of use
> is not the most important  feature of a parser generator. You need to
> know, what you want to achieve first, and then how to formulate this.
> Some nice features, that allow to save some lines of generator input are
> not the way to go, if you don't know what you're doing.
>
> Obviously you can achieve with antlr what you can with yacc as well, but
> using those features go hand in hand with being more carefully.

yeah, i guess since i haven't taken a class on this yet, i don't really 
realize everything that is involved.  but i see ambiguities in properly 
written grammers.  i guess i need to learn to distinguish when ambiguities 
are ok and when they aren't.  the c++ grammer (written for yacc) is given in 
the back of Stroustrup's book, and it has quite a few ambiguities.  also, it 
allows for stuff like 2[2], which correct me if i'm wrong, has no meaning in 
c++.

also, the reason to use antlr isn't really to make the grammer easier to 
write.  its because of the code it generates.  you can write rules that take 
arguments and return values.  also, error recovery is supposedly a lot more 
advanced than yacc (i've only used it very basically though). 

the following antlr rule will match a code block starting with '{' and ending 
with '}' and in matching it, it will parse and store the begin line, endline, 
all variable declarations (and store them in a QList), and also match and 
parse all codeblocks nested in the outer codeblock and store them in a QList 
of CBInfo's.  notice it takes an argument called scope so you can know how 
deep you are in nested codeblocks, and it returns a pointer to a CBInfo.  if 
antlr is matching and parsing a codeblock and it hit something that isn't 
right, an exception is thrown and your rule can catch it and in this case, it 
just deletes the CBInfo pointer, sets it to NULL then returns it.

codeblock
[int scope]
returns [CBInfo* cb]
{
  CBInfo* rv;
  CCVar* temp;
  QList<CCVar>	list;
  cb = new CBInfo;
}
	:  ocb:OCB
	  (	rv=codeblock[scope+1]
	  {	if (rv)
			cb->cbs.append(rv);
	  }
	  |	list=vardecl
	  {	for (temp = list.first(); temp; temp = list.next())
			cb->vars.insert(temp->name, temp);
	  }
	  |	~(OCB|CCB)
	  )*
	  ccb:CCB
	{
		cb->beginLine = ocb->getLine();
		cb->endLine = ccb->getLine();
	}
	;
	exception
	catch [ANTLR_USE_NAMESPACE(antlr)RecognitionException& ex] {
   		if (cb)	{
   			delete cb;
   			cb = NULL;
   		}

   	}

the accually method it generates from this rule is almost exactly how a human 
would, i guess, visuallize.

it would be such a pain to do this with yacc.  sorry for the antlr plug, but 
i think its really really cool.  and hopefully, as i learn more about 
grammers and parsing, i can get this code completion stuff down.

christopher

-
to unsubscribe from this list send an email to kdevelop-request at kdevelop.org with the following body:
unsubscribe »your-email-address«



More information about the KDevelop mailing list