languages/cpp/parser: lexer error

Hamish Rodda rodda at kde.org
Tue Aug 26 08:27:15 UTC 2008


On Tuesday 26 August 2008 17:09:05 Marek Jasovsky wrote:
> 2008/8/26, Hamish Rodda <rodda at kde.org>:
> > Hi Marek,
> >
> > On Tuesday 26 August 2008 02:40:22 Marek Jasovsky wrote:
> >> Hi
> >>
> >> I found this weird behavior in cpp lexer in kdev4 while struggling to
> >> learn something about  duchain.
> >>
> >> in kdevplatform/plugins/duchainviewer/duchainmodel.cpp
> >>
> >> QModelIndex DUChainModel::index(int row, int column, const QModelIndex
> >> & parent) const // line 147
> >> { //line 148
> >>                                                            #
> >>
> >> # if (row < 0 || column < 0 || column > 0 || !m_chain) //line 149
> >>     return QModelIndex(); //line 150
> >>
> >> kdev4 problems view gives me  2 errors marked with # above the line...
> >> (3rd zero and last closing brace, but code gets compiled without
> >> errors.
> >>
> >> first: Expected token ')' after '>' found 'number_literal' (Lexer)
> >> 2nd: Unexpected token ')'
> >
> > This is a known issue, it's because there is an ambiguity in c++, that
> > there _could_ be a template class called "column" with the arguments
> > inside the <>.
> >
> > I emailed Roberto about it a while back but his suggestions are
> > technically difficult at the moment, it will need more thinking about
> > before we can properly fix it.
>
> is your mail to roberto in the mailing list, or it was just personal?
> I am also very interested in parsers so I am curious, what was
> Roberto's solution

It was a personal email, so I'll include it here...

On Sunday 13 July 2008 18:40:09 Roberto Raggi wrote:
> On Sun, Jul 13, 2008 at 1:19 AM, Hamish Rodda <rodda at kde.org> wrote:
> > Hi David / Roberto,
> >
> > The following code gives an error for the kdevelop4 c++ parser:
> >
> > if (z < 0 | z > 0)
>
> Yep, the code is ambiguous. The problem is the parser doesn't perform type
> checking while parsing, so there is pretty much no way to parse those
> expressions correctly.
>
> The general case of your example is "a < b | c > d". The code can be
> recognized as a declaration of "d" if "a" is a template name, and "b", "c"
> constant expressions.
>
> // file decl.cpp
> template <int> class a {};
> const int b = 1;
> const int c = 2;
> a<b|c> d; // parsed as declaration
>
>
> or, it can be recognized as an expression
>
> // expr.cpp
> int main() {
> int a, b, c, d;
> a < b | c > d;
> }
>
>
> OK, you can probably fix your particular case using backtracking, because a
> template-id (z<0|z>) followed by an int literal (0) is not valid, but I
> don't think you should do it. I mean you should try to fix the general
> case. You already have all the information you need (from the ud-chain), so
> you just have to parse template-ids only if the name is a template name.

So, the fix really requires the parser to either:
1) integrate with duchain parsing simultaneously, ie. as the parser creates 
the AST, the duchain builder (1st pass) visits the AST.  This would be a huge 
headache to organise, create many regressions, and all for one lousy bug fix.

2) Create two branches and let the duchain pick which is the correct branch.
This is much easier and is probably the road we have to take, but it means 
saving the error messages on the AST nodes instead of in one list, so you only 
pick the errors which actually apply.  It may mean a small performance penalty 
(hopefully not a large one).

Cheers,
Hamish.




More information about the KDevelop-devel mailing list