Cpp Parser & multibyte chars (bug 274430)

Fri Nov 18 18:11:09 UTC 2011

On 18.11.11 18:28:07, Milian Wolff wrote:
> On Friday 18 November 2011 17:55:18 Milian Wolff wrote:
> > Hey all
> > 
> > I've spent some time today and investigated bug 274430 [1], which shows that
> > our C++ parser breaks on C-Strings containing wide chars.
> > 
> > Andreas tried to convince me in IRC that this is "broken code", since
> > anything besides ASCII in C++ code is undefined.
> 
> PovAddict pointed out that we also get confused by multibyte chars in 
> comments, which of course is just as bad!
> 
> I've added a unit test for that now - still left to see how we are supposed to 
> fix this...

As I just said on IRC, I was partially wrong with that assertion. I
still think the exact example from the bugreport is broken code.

However the case of using QString::fromUtf8 with utf-8 encoded C++ files
is fine. Just as well as it is a bug if our C++ parser reads QByteArray,
creates markers for that and then nothing converts that when going from
QByteArray to QString.

One major complication: Detecting the encoding is not always going to
work, thats why Kate offers to change the encoding. So essentially the
parser needs to be able to ask kate wether the user chose a specific
encoding for that file and use that when decoding the qbytearray. At
least though there's quite good auto-detection support these days in
kdelibs AFAIK based on what Firefox does.

Andreas