Third iteration of QMake parser, looking for a parser generator

Andreas Pakulat apaku at gmx.de
Thu Jul 5 11:55:20 UTC 2007


On 05.07.07 12:31:39, Roberto Raggi wrote:
> Il giorno 05/lug/07, alle ore 02:44, Andreas Pakulat ha scritto:
> 
> > On 03.07.07 03:35:59, Andreas Pakulat wrote:
> >> I'm going to give coco/R a more thorough test in the next days,  
> >> but in
> >> the meantime I welcome any comments.
> >
> > So I played around with coco/R now and a big plus is that I was  
> > able to
> > use QString/QChar for the lexer only by changing the template's.  
> > I'm not
> 
> cool!

Unfortunately it turns out that the lexer is otherwise incapable to
create a qmake parser. The problem is quoted values and the habit Coco/R
to completely ignore any whitespace that exists. So I would have to
create quoted values inside the Lexer, however that means I have to
later on parse the quoted string to execute functions in it and replace
variables, whereas I'd like to get individual ast nodes for each of
them. Would make life soooo much easier :)

On top of that: Providing a hand-written lexer still means providing
some "undocumented" coco_string_* functions that create, format, and
otherwise handle wchar_t*, because the code the parser-generator creates
relies on these.

So to conclude: kdev-pg will get another use-case :)

> > switch to kdev-pg + handwritten lexer. Along the way adding a
> > token stream that uses wchar_t* to properly support unicode.
> 
> Yeah, that hardcoded "char*" in the token stream is just wrong :-)

Yeap, but it seems to be easily replaceable, because nothing except
"handwritten" code relies on it (the decoder and io stuff that normally
just copy-pasted).

> an  
> alternative solution is to kill the "text" field from the token_type  
> declaration and provide a tokenText that works with QString/ 
> QStringRef. For example,
> 
>    struct token_type
>    {
>      int kind;
>      std::size_t begin;
>      std::size_t end;
>      // ### KILL THIS DECLARATION     char const *text;
>    };
> 
> in your subclass
> 
>    QStringRef tokenText(size_t index) const {
>      kdev_pg_token_stream::token_type t = _M_token_stream->token(index);
>      return QStringRef(&contents, t.begin(), t.end() - t.begin 
> ());    // contents is a QString
>    }
> 
> See? unicode support and fast token manipulation because you don't  
> have to create(or copy) QString(s).

Cool, my initial thought was to just replace the char const* with
QString, but indeed that would mean some more overhead due to string
creation/copying...

> Let me know if you decide to  
> write a hand-written lexer. I can send you an example of how I like  
> to write scanners, you can look at it and decide if you want to use a  
> similar approach.

That would be cool, I've already got an idea how to write it, but that
would be using regexps (possibly QRegExp).

Andreas

-- 
You are fighting for survival in your own sweet and gentle way.




More information about the KDevelop-devel mailing list