Cpp Parser & multibyte chars (bug 274430)

David Nolden david.nolden.kdevelop at art-master.de
Sun Nov 20 19:39:21 UTC 2011


The parser uses IndexedString directly, and we have defined that the
contents of IndexedString should be utf-8 encoded.

So, to get the encoding right, all we would have to do is:
1. Get the ranges right when the contents is utf-8 encoded
2. Convert contents which is not utf-8 encoded into utf-8 while reading it

Both are independent. However, I don't like the idea of using
".kateconfig" for configuring the encoding, that seems messy, because
this file means to configure the editor, and using the information
more extensively even for closed files feels like an obscure
side-effect.

Doing the mapping while highlighting should not be too difficult,
although would require some work. We would have to read the utf-8
encoded line, extract the specific set of column-offsets, and apply
those offsets to the when before creating KTextEditor::Range from
RangeInRevision. This would need reading the utf-8 specification to
check how to extract the offsets, however I'm pretty sure that the
utf-8 specification is easy enough regarding this.

Hmm, there might also be the possibility to apply the offsets while
parsing, the same way offsets introduced by macros are handled, maybe
this should be checked first.

Greetings, David




More information about the KDevelop-devel mailing list