New parser branch (Was: Dumping the source DOM?)

Wed Jul 13 15:03:06 UTC 2005

Roberto Raggi wrote:

> #include <my-cool-header.h>
> 
> int main()
> {
>   my_cool_function("ciao\n");
> }
> 
> because the IDE doesn't where is the file my-cool-header.h.. KDevelop
> works just fine in this case. gccxml will fail. THIS IS NOT ACCETABLE!

Since any reasonable project manager will allow you to specify include
paths, what's the problem?

> 
> 
>> One possible approach I had in mind was to make parser restartable. First
>> you run g++ parser on the code till the first token it cannot parse. As
>> you
> 
> wrong!
> 
>   1) gcc takes about the 50% of your CPU. it is a bit too much for a
>       background parser(== you can't type in KDevelop and compile your
>       project
> with the "real" gcc at the same time)
.....
>  3) import the source code of a project will take almost the same time to
> compile it. So you have to wait about 1 hour before load the KDevelop
> project, 2 hours for kdelibs/kdebase, and so on..

To begin with, you numbers are wrong. Here's an output from gcc on a certain
file:

 cfg construction      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 cfg cleanup           :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.50 ( 1%) wall
 trivially dead code   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 life analysis         :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 life info update      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 preprocessing         :   0.71 ( 1%) usr   0.26 ( 6%) sys   1.00 ( 1%) wall
 lexical analysis      :   0.22 ( 0%) usr   0.44 (10%) sys   2.00 ( 2%) wall
 parser                :   3.83 ( 4%) usr   0.75 (18%) sys   5.00 ( 5%) wall
 name lookup           :   1.74 ( 2%) usr   2.17 (51%) sys   3.00 ( 3%) wall
 expand                :   0.52 ( 1%) usr   0.02 ( 0%) sys   1.00 ( 1%) wall
 varconst              :   0.17 ( 0%) usr   0.02 ( 0%) sys   0.50 ( 1%) wall
 integration           :   0.13 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 jump                  :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         :   0.02 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 mode switching        :   0.11 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 local alloc           :   0.22 ( 0%) usr   0.03 ( 1%) sys   0.50 ( 1%) wall
 global alloc          :   0.56 ( 1%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 flow 2                :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 shorten branches      :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 reg stack             :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 final                 :   0.25 ( 0%) usr   0.04 ( 1%) sys   0.50 ( 1%) wall
 symout                :  77.09 (89%) usr   0.45 (11%) sys  77.00 (85%) wall
 rest of compilation   :   0.32 ( 0%) usr   0.04 ( 1%) sys   0.00 ( 0%) wall

It spends 85% of all the time doing what? Basically, first output code for
all the template function there are, and then outputting debug info for the
myriad of template classes from Boost. The actual parsing takes mere 15%.

>   2) it will not help the code completion.. in fact you will not be able
>   to
> perform any code completion. Because, the code is *UNFINISHED* (please
> read it again IT IS UNFINISHED).. gcc will not produce any abstract syntax
> tree, you will not populate the code model, and you will not have the code
> completion

I'm afraid you are wrong again. The gcc parse is just a recursive descent
one, and each function returns a value of type 'tree' -- which is just your
AST. In general, in not even possible to parse C++ without maintaining
correct symbol tables at *parse time*. Consider this:

   template<class T1>
   struct Outer {
       template<class T2> void foo();
       void bar();
   };

   int main()
   {
       Outer<int> v;
       v.foo<int>();
    }

It's only possible to parse call to 'foo' correctly if you know the type of
'v' and can look into 'v's scope to determine that 'foo' is a function
template, and not something else.

The fact that gccxml will not produce a parse tree unless you need a
complete translation unit to it, does not mean that gcc parser does not
build AST.

>> type more tokens you feed them to the parser. If you go to the beginning
>> of the file and start typing there, you rewind parser state and start
>> parsing again.
> 
> I think I will stop here. This thread starts to be annoying. I'm sorry
> Vladimir, but I don't think you know what you're talking about. Anyway,
> good luck with your project. Maybe it is me that I don't see your point
> and maybe you're right and gccxml and parse *only* valid source code is
> the right solution for KDevelop.

I'll stop here too and will hope we won't get yet another parser that can
parse the easy 90% of C++. 

- Volodya