Cpp Parser & multibyte chars (bug 274430)

Sun Nov 20 16:15:19 UTC 2011

On 20.11.11 12:01:30, Sven Brauch wrote:
> I suffer from the same problem, althogh I don't know exactly *how*
> (there's random fails, but I don't know when exactly they occur -- i
> think if files do not use utf8 and do not explicitly specify this, but
> still use non-ascii characters). However it's quite a bit easier in
> python, because _most_ files have the encoding denoted in the header
> (it's required when using fancy characters in the source code). So I'd
> be interested in knowing the text editor's encoding for a specific
> file, too.

Python is a lot easier exactly because of this, i.e. for any given file
its well defined what encoding to use, either ascii if no encoding is
given or whatever the encoding-declaration defines. (not sure right now
wether Py3 defines utf8 to be the default) So the parser can itself
determine which encoding to use, all files with non-ascii content
without an encoding-line is defined to be broken by the Python standard.

For C++ there's no such thing available, which makes this so
complicated to do right.

Andreas