c++ code completion status report

Wed Jan 9 15:34:28 UTC 2002

On Tuesday 08 January 2002 3:23 pm, Thomas Schilling wrote:
> Hi Richard,
>
> > I had a look at the sources in gcc last night, and have to say I got
> > 'fear
> > of
> > large grammars' from looking at the C++ grammar in gcc/cp/parse.y :). The
> > lexer in lex.c/lex.h (it doesn't use flex) looked as though it might be
> > more
> > easily adaptable. I haven't looked at the preprocessor code yet.
>
> Horrifying, isn't it? But fortunately we only need to parse only
> declarations.
> Only few statements.
Yes, but if the grammar for everything else was still there (as opposed to 
just skipping tokens that weren't part of declarations), it would be easier 
to use as a basis for code refactoring etc.

>
> > One way of forcing an entry point into a bison grammar might be to use a
> > backdoor into lex. You could have a function which puts lex into a
> > certain start state where it emits a special symbol like
> > 'CODE_COMPLETION_START' which didn't really exist, and then after that
> > the lexer behaves normally.
>
> Sounds good. I'll consider it.
>
> I tested my top-down-parser on a buggy expression and -
> it worked without moaning. But now I need backtracking.
> So I need to think twice before continuing work.
I can help with implementing non-deterministic parsers, but I still don't 
think we need one for the problem in hand.

> > Another way would be to add extra stuff to the partial expression, so it
>
> was
>
> > always complete and grammatically correct before passing it on to bison.
>
> Hm, but how to find out what causes no errors?
>
> > But I think it would be best to use bison to parse up to the previous
>
> complete
>
> > statement, then just do something simple with regular expressions to pick
>
> out
>
> > the code completion variable in the current statement.
>
> Actually we don't need the previous statement (it even can
> stop the parser if it's buggy). Only if it's a declaration we have
> to heed it.
>
> >  In the example above,
> > as long as b wasn't declared in the same statement as the expression, it
> > would work ok. You don't need to parse 'a+b.' well enough to evaluate it,
> > just well enough to find the name of identifier b. You just need to look
>
> for
>
> > a type specifier or a cast before the code completion variable with
>
> regular
>
> > expression and QStrings.
>
> Yes, I also had some thoughts about 'reverse parsing'. But it's also quite
> difficult. Also it may be better if we had a more powerful parser to let
> it be useful for later extensions.
>
> > + b. ==> look for a type declaration on a previous statement
> > (mytype) b ==> use mytype as type
> > mytype b ==> use mytype as type
>
> mytype (b) ==> use mytype as type
But only in a constructor list?

> And how about this:
> class A { ... }; class B { ... }; class C { ... };
> C operator+(A,B);
> A a; B b;
> ... (a+b). // we should list C's members
>
> Darn C++, uh? ... ;)
Bad coding style if you ask me! If I don't keep the result of 'a+b' in a 
variable, what if I might need the result again later in the code - it would 
need recomputing from 'a+b'? If they are the sort of person who needs code 
completion (ie they aren't 'super geeks' who can remember all the methods in 
a class library after one read, and they'll be using emacs forever anyway :) 
), then they'll probably code this in two statements as:

C foo = (a+b);
foo.<some method>;

And if they don't, then code completion won't work. 

> BTW: "a+" could also get CC - if it's a class and has '+' overloaded
>  (so it's like argument hinting)
Nothing before 'a' and 'a+' are the same as '+a'; the only thing that 
overrides a previously declared type would be a cast in brackets (or if 'a' 
was being declared for the first time as 'mytype a') I would have thought.

That's why it's better to only invoke code completion at the user's request - 
you are just creating problems like this otherwise. What if I would like to 
type 'foobar->get' and expect to be show all the methods beginning with 
'get..' - I think code completion should be 'user driven', and not running 
all the time? We should avoid parsing the entire file everytime the user 
presses a character, because it is bound to be too slow for large files.

-- Richard