<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 11/18/2011 05:55 PM, Milian Wolff wrote:

    <blockquote cite="mid:3039063.Z0jvncQx07@minime" type="cite">

      <pre wrap="">Hey all

I've spent some time today and investigated bug 274430 [1], which shows that 

our C++ parser breaks on C-Strings containing wide chars.

Andreas tried to convince me in IRC that this is "broken code", since anything 

besides ASCII in C++ code is undefined.

I highly disagree, just because it's undefined doesn't mean one must not use 

it. Sure, if you are writing portable code one *should* not use it, but at 

least in my university and probably in science in general, people tend to like 

utf8 symbols in the output of their computation results. And since most of 

them are using UTF8 anyways, they will simply put UTF8 chars into their code.

So I'd like to fix this, but how? The big issue I see is that our parser 

operates on QByteArrays (why?) instead of QString, and as such looses all 

encoding information. Hence our lexer needs two steps to iterate over an "ä"-

char instead of one and thusly things it's two chars wide...

Any ideas on how to fix this without rewriting the whole parser to use 

QStrings?

[1]: <a class="moz-txt-link-freetext" href="https://bugs.kde.org/show_bug.cgi?id=274430">https://bugs.kde.org/show_bug.cgi?id=274430</a>

Bye

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

    </blockquote>

    Hi!<br>

    Well, I think we should consider it a bug. Maybe it's not high

    priority but something good to have, C++11 does support unicode [1]

    in the end.<br>

    <br>

    It's "just" a matter of offsets anyway, so using QString probably

    wouldn't pay off memory-space-wise, but I don't know enough about

    UTF.<br>

    <br>

    Aleix<br>

    <br>

    [1]

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    <a href="http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals">http://en.wikipedia.org/wiki/C%2B%2B11#New_string_literals</a>

  </body>

</html>