c++ code completion status report

Wed Jan 9 16:59:04 UTC 2002

Hi Richard,

Looks like there's some need for 'enlightenment'. ;)

> > Horrifying, isn't it? But fortunately we only need to parse only
> > declarations.
> > Only few statements.
> Yes, but if the grammar for everything else was still there (as opposed to
> just skipping tokens that weren't part of declarations), it would be
easier
> to use as a basis for code refactoring etc.

> > I tested my top-down-parser on a buggy expression and -
> > it worked without moaning. But now I need backtracking.
> > So I need to think twice before continuing work.
> I can help with implementing non-deterministic parsers, but I still don't
> think we need one for the problem in hand.

Well, my problem is the error resistance. When a bottom-up
parser like bison aborts the stack is full of pointers to nodes
of some tree. They all have to be deleted anyhow - else the
RAM usage of the parser would be really bad :). But we cannot
solve this problem with a destructor since it wouldn't be called
for bison is C and C has no 'new' or 'delete'.
The other problem is that those nodes cannot easily be linked
to higher nodes (like operator nodes) without having lot's of rules
handling errors.
Ok, let me use an example:
 "a=b-c." this is our incomplete expression.
Our grammar is:

expr: expr '=' expr { $$ = new ExprNode('=',$1,$3); }
    | expr '-' expr { $$ = new ExprNode('-',$1,$3); }
    | expr '.' id_expr { $$ = new ExprNode ('.', $1, $3); }
    | ...
id_expr: ID { $$ = new ExprNode((char*)yylval); }

on parsing we'd now have: expr_id matched thrice
but now an error would occur since '-' and '=' aren't
reduced since '.' has the highest priority. But due to the
error nothing would happen and three ExprNodes would
be allocatated and romb somewhere in the memory -
lost for ever. And we'd never fins out that left to the '.'
was a "c" and no higher prio op. My top-down parser
has no problems it sould simply put an error-node
in there and continue parsing.
Maybe parsing backwards would be a solution. (As
written below.)

> > Yes, I also had some thoughts about 'reverse parsing'. But it's also
quite
> > difficult. Also it may be better if we had a more powerful parser to let
> > it be useful for later extensions.
> >
> > > + b. ==> look for a type declaration on a previous statement
> > > (mytype) b ==> use mytype as type
> > > mytype b ==> use mytype as type
> >
> > mytype (b) ==> use mytype as type
> But only in a constructor list?
No that's function style typecast
(e.g.: QString *name=new QString("Thomas"); or int i = int('a'); )

> > And how about this:
> > class A { ... }; class B { ... }; class C { ... };
> > C operator+(A,B);
> > A a; B b;
> > ... (a+b). // we should list C's members
> >
> > Darn C++, uh? ... ;)
> Bad coding style if you ask me! If I don't keep the result of 'a+b' in a
> variable, what if I might need the result again later in the code - it
would
> need recomputing from 'a+b'?

I don't think so, e.g: double operator*(Vector3D&, Vector3D&);

> If they are the sort of person who needs code
> completion (ie they aren't 'super geeks' who can remember all the methods
in
> a class library after one read, and they'll be using emacs forever anyway
:)
> ), then they'll probably code this in two statements as:

I'd like to have CC. It's always nice when you have member
names longer than 4 chars. or how often do you use constants.
Do you keep all their names?

> > BTW: "a+" could also get CC - if it's a class and has '+' overloaded
> >  (so it's like argument hinting)
> Nothing before 'a' and 'a+' are the same as '+a'; the only thing that
> overrides a previously declared type would be a cast in brackets (or if
'a'
> was being declared for the first time as 'mytype a') I would have thought.

Uh, well, hm, I don't know what  you mean but I meant
"a+" could get code completions if there's
"class A { A& operator+( anytype ); }; "
So the CC part could show "anytype" (in a hint box or so)

> That's why it's better to only invoke code completion at the user's
request -
> you are just creating problems like this otherwise. What if I would like
to
> type 'foobar->get' and expect to be show all the methods beginning with
> 'get..' - I think code completion should be 'user driven', and not running
> all the time? We should avoid parsing the entire file everytime the user
> presses a character, because it is bound to be too slow for large files.

of course automatic code completion can become annoying.
Therefore you must have the chance to disable automatic CC
for some cases (e.g. only members, only type, ...) but especially
member completion can avoid errors since there can be no
misspellings of members and constants.

But this - of course - must not slow down the editing speed.
Therefore at least the following optimization is necessary:
Things that didn't change must not be reparsed. So we'd keep
sort of a mini-CS for the current file. Reparsing a changed statement
could even occur when a special token was typed (e.g.: ";"). So this
can speed up CC dramatically since it reduces reparses.

Thomas

P.S.: The latter idea actually came up to me while writing this
(sounds good, dosn't it?) :D