Adding location info to the C++ parser

Tue Mar 21 14:28:15 UTC 2006

On Tuesday 21 March 2006 03:23, Roberto Raggi wrote:
> Hi Matt!
>
> On Friday 17 March 2006 04:53, Matt Rogers wrote:
> > Hi,
> >
> > I've come up with a plan that I intend to use to modify the c++ parser so
> > that it provides proper location info for use in KDevelop. I submit my
> > plan here so that it can be reviewed, suggestions can be provided, or i
> > can be flat out told I'm wrong.  :)
>
> COOL!
>
> > ==========================
> >
> > Plan:
> >
> > Change: Modify the preprocessor so it does not strip indentation or blank
> > lines (blank lines are mostly when comments are being removed)
>
> make sense.. I know rpp's code is pretty weird.. I was trying to be cool
> when I wrote it(I used a lot of STL crap :-), but now I understand it was a
> stupid idea. We really should replace all that crap with *cute* Qt code ;-)
> BTW let me know if you have any problem with source code..
>

Actually, once I figured out how many design patterns were used in the parser, 
it was quite easy. :)

Thanks for adding the code that keeps track of the line numbers. Adam's code 
view already works much better now. :)

> > Change: Verify the preprocessor outputs line number markers similar to
> > those output by gcc -E and if it does not, modify the preprocessor to
> > output line number markers similar to the output of gcc -E
>
> my original plan for the KDevelop's C++ engine was to use the preprocessor
> as tokenizer. ATM the preprocessor generates plain C++ code (as g++ -E),
> but would be nice to generate the token stream, and not plain C++ code.
> This is possible and it shouldn't be too difficult. This approch has many
> advantages, like:
>   - the C++ engine will use less memory (no needs to generate a temp
> buffer) - we can use the same trick I used in KDevelop 3 to *fix* the
> column positions after the macro expansion
>   - less code to maintain (we kill the C++ tokenizer)
>

hmm, yes, this is a good idea too. However, Richard Dale came up with a nice 
idea of having two sets of tokens, which i quote below:

 I think you need to have two set of tokens, the first set when parsing the 
 original source before preprocessing and these tokens would have line/column 
 info for the original source. Then after preprocessing there would need to be 
 a second set of tokens which are passed to the language parser. The second 
 set of tokens might have pointers to the token in the first set that they 
 were 'derived' from via a preprocessor expansion. The reason for this is the 
 if the parser is to be used for refactoring it must be able to know which 
 chunks of text in the original source correspond to a particular grammar 
 rule, and as far as I can see this can only be done by introducing an extra 
 set of tokens and with an extra level of indirection in the second set.

> > Please let me know what you think, if i'm on the right track, if i'm just
> > completely wrong, if i've left out something, etc. I would appreciate any
> > feedback. I will attempt to keep the parser as fast as it is now, but i
> > can't guarantee anything.
>
> well.. I think it's cool plan, and I'm sure you will make it. As I said
> before let me know if you have any problem.
>

will do, thanks. :)
--
Matt