C++ Parsers for KDevelop (kinda long)

Daniel Berlin dan at cgsoftware.com
Thu Feb 24 20:24:59 GMT 2000

















I was browsing the archives of the mailing list, and about a week ago,
their was a discussion of C++ Parsers, and what kdevelop should be using.
Since this is an area i have a *lot* of expertise in, i figure i'd
subscribe and offer some useful info.

First off, i've literally looked at every single C++ parser out there. In
the projects i was needing a C++ parser for (2 development environments),
I had just about the same requrements Kdevelop has. I hunted down every
single C++ parser available, be it part of some larger compiler, or
standalone.
I also evaluated writing my own in various ways (Parsing is kinda my
forte. I know my way around antlr,yacc,btyacc,etc).

Before i go over the various parsers available, let me say this: I'm not
listing them all. But i've evaluated them all. If you come across one and
want to know anything about it (IE how hard to integrate it into a class
browser, etc), feel free to ask.

So with that out of the way, let me go over the parsers, their pros, cons,
etc.

The C++ Parser in OpenC++ (do a web search on openc++):

Pros:  Probably the most
complete of any parser. It can easily handle the STL now. It requires
preprocessed text, but it took me about 3 days to hack enough of a
preprocessor into it to so that i could parse non-preprocessed text, and
not have to see the system classes.
You can get *any* info you need out of it. Types of variables in
functions,etc.
It's written in C++, and very very well written. If you want an example of
how to hand code a recursive descent parser, this is a great one.
It's also the fastest. When i removed the last pass (wrting the info back
out), and made it not have to deal with system include files, it took ~60
milliseconds to parse and understand a 30k file.
OpenC++ builds as a library, and then the compiler is a very small driver
linked with the library. So it's already able to be easily linked with
other programs.

 Cons: Uses a garbage collector. Boehm's to be exact. On
the upside, it can
be turned off with a simple define. But you'd have to delete in the right
places.
This could be looked at as a Pro, but i was on BeOS, and BeOS doesn't
support Boehm's (I ported it eventually).
The format of the parse tree is hard to use. 
On the upside, OpenC++ was built to be able to get the info about a C++
file, and has walkers and whatnot.
You are better off overriding the various functions that were used to call
metaobjects, and make them do your dirty work.

The C++ parser in CTAGS:

Pros: As fast as OpenC++.
Doesn't need preprocessed text
Supports more than just C++
Provides a lot of the info you need

Cons:
Not as easy to integrate
Not all that well written (IMHO. It's hard to follow the flow).

The C++ parser in G++:

Pros: Sorry, there really are none. They want to rewrite it as well. 
Cons: Completely full of hacks, not all that quick, etc.

The C++ parser in Doc++/Doxygen (DOC++ is the original, Doxygen made
improvements onit).

Pros: Written in flex. I kid you not. It's a full C++ parser, written in
flex. It's very easy to extend, and very well written.
Pretty fast, about 40% as fast as OpenC++.
Doesn't care if you give it preprocessed text or not.

Cons:
Uses it's own string/vector/etc implemntation. Took about a day of
straight hacking to convert it to use the STL
I had many problems with memory leaks.
You need to rip out the stuff that handles parsing the doc++ comments, but
this takes a few minutes at the most.


C-Browser yacc grammar (hard to find):
Pros: Written in Yacc, fast.
Cons: Doesn't handle much

The Empathy C++ Parser (the PCCTS based one):
Pros: Pretty complete, includes preprocessor.
Cons: My brain exploded when i tried to understand how to actually
integrate it, or modify it. And i know PCCTS and ANTLR inside and out.


There are about 10 others (mainly part of compilers), not really useful. 
(This includes Roskind's yacc grammar)

All told, your best approach is probably to take OpenC++'s parser, and
make it use something besides lispish PTrees.

I've implemented class browsers and auto-completion, or attempted to, with
each of these parsers.
So feel free to ask if you have any questions.
HTH,
--Dan




More information about the KDevelop mailing list