r++: making strings from tokens?

Steven T. Hatton hattons at globalsymmetry.com
Thu Sep 1 17:36:15 UTC 2005


I believe I have a general idea of what the members of the various AST classes 
represent, but it is rather difficult to figure out how to handle each 
instance.  I'm trying to write a visitor that prints a representation of each 
node in the tree.  In some cases a member variable of type std::size_t might 
represent the value of the TokenStream cursor where the token is located.  In 
other cases it represents the actual numeric value of the character, e.g., 
tilde.  Some tokens have their .extra member union set to a pointer to a 
NameSymbol. In the case of a '}', the union is set to the numeric value of 
'}'.

I have to do more investigation to determine if I can trust that Token::extra 
is reliably initialized to zero.  Experimental evidence suggests that it is.  
Putting this all together I did this:

In tokens.h I added:

#include "lexer.h"
#include <ostream>

std::ostream& operator<<(std::ostream& out, const Token& tk);
//---------------------------------------

//In tokens.cpp I added:
std::ostream& operator<<(std::ostream& out, const Token& tk){
  if(tk.extra.symbol) {
    if('}' == tk.extra.right_brace) return out<<"}";
    return out<<tk.extra.symbol->as_string();
  }
  return out<<token_name(tk.kind);
}
//-----------------------------

//in main.cpp I added this to the if(ast) block
if (ast) {
  for(size_t i = 0; TOKEN_KIND(p.token_stream[i].kind) != Token_EOF && i < 
p.token_stream.size(); ++i)
      std::cerr << p.token_stream[i]<<"\n";
    std::cerr<<std::endl;

//----------------------------

// This is the input file to parse
//HelloWorld.cpp
#include <iostream>
#include <string>

class HelloWorld {
public:
  HelloWorld(const std::string& message_ = "Hello World" ) :_message(message_) 
{
}
  std::ostream& print(std::ostream& out) const { return out << _message; }
private:
  std::string _message;
};

std::ostream& operator<<(std::ostream& out, const HelloWorld& hw) {
  return hw.print( out );
}

int mine () {
  HelloWorld hw;
  std::cout<<hw<<std::endl;
  return 0;
}
//------------------------------------------

When I parse HelloWorld.cpp, I get this:

class
HelloWorld
{
public
:
HelloWorld
(
const
std
scope
string
&
message_
=
"Hello World"
)
:
_message
(
message_
)
{
}
std
scope
ostream
&
print
(
std
scope
ostream
&
out
)
const
{
return
out
shift
_message
;
}
private
:
std
scope
string
_message
;
}

One thing I don't like about this is that it replaces the `::' with `scope', 
and the `<<' with `shift'.  But that's minor in comparison to the fact that 
it is very haphazard. Is my approach safe? Is it the best way of retrieving 
the human readable representation of the translation unit?

-- 
Regards,
Steven




More information about the KDevelop-devel mailing list