c++ code completion status report

Fri Jan 4 20:16:03 UTC 2002

Hi Thomas,

On Friday 04 January 2002 20:41, Thomas Schilling wrote:
>
> I think this won't suffice.
>
> Ok, I'll show you my idea of the CC parser:
> (The UI I will keep for another day :)
>
> First, Here's an example the parser must be able to handle:
> |  struct S1 { int x, y; };
> |  struct S2 { int a, b; };
> |  class C { public:
> |   S1 a(int i);
> |   S2 a(double d);
> | };
> | /* ... */
> | int j, k;   C c;
> | j = c.a(2). // list members of S1
> | k = c.a(2.0). // list members of S2
>
> So the parser has to be able to evaluate an arbitrary
                                                ^^^^^^^^^^^^^
> expression concerning it's result type to determine what
^^^^^^^^
> to do/show.

I think you've eaten too many brownies in new year's party :) However, you 
should _not_ evaluate any expression. What you refer to is static type 
analysis and you should not try to go beyond that. 

> My idea was to build up a complete parse tree
> (doesn't matter if through DOM or any other interface - I'm
> working on one that is as small as possible). In that tree we
> then look for the node we have to determine the type (this
> can be a complete sub-expr.) and decide what to do.
>

It would be much better to use  a dependency grammar for that purpose. You 
should be using an attribute grammar rather than inventing how to write a 
parser. (It looks like there are many available implementations) And you need 
only a type analyzer, which isn't too difficult: however typedefs and 
templates complicate this matter beyond recognition. My advice is to first 
make a thorough survey of what other people have done.

> The best method (I think (at the moment)) would be to parse
> the current source file up to the cursor and examine the node
> created last. We have to consider all locally and globally
> defined variables and types (maybe even labels) and then
> gain our information.
>

No, the best method is an incremental parser, which is beyond the 
capabilities of our middle-level imperative implementation language, of 
course unless we have a clever coder who can devote the rest of his life to 
implement an incremental LALR parser + type semantics.

In our constraints, the best way is of course not to parse the entire file, 
with all its includes, every time a character is pressed, right? What we 
should be doing is to approximate an incremental parser by storing all type 
analysis in a database, and try to update that information which has been 
changed. That is, compute only lazily. The required level of granularity is 
per-file. Store the type analysis results of all project-files in the 
database. Also required is a simple inference engine to account for 
inter-module dependencies. Dependencies determine whether type analysis is 
required for a given module.

> Since this parser examines only the current file it's pure
> memory access and thus should be very fast. A simple
> optimation would be to skip all blocks (compound statements)
> that close before the block the cursor is at (but only code
> blocks, not declarations).

Well, you don't only examine the _current_ file as I've said of course. You 
need all type information in the included interfaces. Current file becomes 
quite large after the preprocessor is done with it.

Doing the type analysis correct and complete for C++ language doesn't seem to 
me as easy as you describe, though.

> > a the "high-speed" project loading.
>
> I vote FOR a persistant CS, for I don't know how much
> memory lot's of libraries would cost. And to end up this
> XML or DOM or whatever discussion: It may be possible
> to use XML or DOM as the communication layer between
> the front-end (tree of libraries) and the back-end (any sort
> of database or file). Otherwise I don't know what to use it
> for. Ain't this what XML want's to be - a framework for
> structured information interchange?
>

I think this depends on what you would like to use a static type analyzer 
for. If you would like to have interactivity of any sort you would have to 
forget about untuned, bloated structure code. (QDom) I'm not mentioning XML 
at all.

Also, the only application of such a code should not be code completion.

Thanks,

-- 
Eray Ozkural (exa) <erayo at cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C