Cpp Parser & multibyte chars (bug 274430)
Esben Mose Hansen
kde at mosehansen.dk
Sat Nov 19 14:21:50 UTC 2011
On 2011-11-18 17:55, Milian Wolff wrote:
> Andreas tried to convince me in IRC that this is "broken code", since anything
> besides ASCII in C++ code is undefined.
That is not true, however. According to the standard: (2.2)
1. Physical source file characters are mapped, in an implementation
defined manner, to the basic source character set (introducing new-line
characters for end-of-line indicators) if necessary. The set of phys-
ical source file characters accepted is implementation-defined. Trigraph
sequences (2.4) are replaced
by corresponding single-character internal representations. Any source
file character not in the basic
source character set (2.3) is replaced by the universal-character-name
that designates that charac-
ter. (An implementation may use any internal encoding, so long as an
actual extended character
encountered in the source file, and the same extended character
expressed in the source file as a
universal-character-name (i.e., using the \uXXXX notation), are handled
equivalently.)
So any character are valid in the source code in an
implementation-defined manner. Later on, 2.14.5 explains how these
characters are valid in strings.
So I am pretty sure it is perfect valid to do "½" or "å" or whatever.
However, what exactly happens is implementation defined (there is as I
recall a rather big section on the GCC manual about this).
--
very kind regards,
Esben Mose Hansen
More information about the KDevelop-devel
mailing list