Adding location info to the C++ parser

Matt Rogers mattr at
Fri Mar 17 22:36:08 UTC 2006

On Friday 17 March 2006 03:17, Richard Dale wrote:
> On Friday 17 March 2006 03:53, Matt Rogers wrote:
> > Hi,
> >
> > I've come up with a plan that I intend to use to modify the c++ parser so
> > that it provides proper location info for use in KDevelop. I submit my
> > plan here so that it can be reviewed, suggestions can be provided, or i
> > can be flat out told I'm wrong.  :)
> >
> > ==========================
> >
> > Plan:
> >
> > Change: Modify the preprocessor so it does not strip indentation or blank
> > lines (blank lines are mostly when comments are being removed)
> >
> > Reason: Proper column information is needed and if the preprocessor
> > removes indentation that will mess up column information. If the
> > preprocessor removes comments and the newline that follows them, then the
> > line information is automatically thrown off.
> >
> > Change: Verify the preprocessor outputs line number markers similar to
> > those output by gcc -E and if it does not, modify the preprocessor to
> > output line number markers similar to the output of gcc -E
> >
> > Reason: This needs to be done to ensure that the parser (via the
> > tokenizer) has proper line numbers to work with.
> >
> > Change: Modify the tokenizer to store line and column information within
> > the tokens
> >
> > Reason: This needs to be done so that the parser can add this information
> > to the code model via the binder
> >
> > ==========================
> >
> > Please let me know what you think, if i'm on the right track, if i'm just
> > completely wrong, if i've left out something, etc. I would appreciate any
> > feedback. I will attempt to keep the parser as fast as it is now, but i
> > can't guarantee anything.
> I think you need to have two set of tokens, the first set when parsing the
> original source before preprocessing and these tokens would have
> line/column info for the original source. Then after preprocessing there
> would need to be a second set of tokens which are passed to the language
> parser. The second set of tokens might have pointers to the token in the
> first set that they were 'derived' from via a preprocessor expansion. The
> reason for this is the if the parser is to be used for refactoring it must
> be able to know which chunks of text in the original source correspond to a
> particular grammar rule, and as far as I can see this can only be done by
> introducing an extra set of tokens and with an extra level of indirection
> in the second set.
> I don't think you can get round the problem by not stripping comments and
> white space, because a macro expansion on a particular line will obviously
> screw up the column info of any items on the same line that follow it.
> -- Richard

yes, i hadn't thought about that. After thinking about it a bit more, I'm 
pretty sure a preprocess pass would only be needed to pull in symbols from 
includes so that they're parseable for code completion purposes and to verify 
that macros used are actually present. 

Anyways, I guess the thing to do is to make the binder see the difference 
between the preprocessed source and the original source and to sort of merge 
the two. Sound sane?

More information about the KDevelop-devel mailing list