Encoding issues in cmake files

Thu Feb 7 17:05:28 UTC 2013

Hi,

On Thu, Feb 7, 2013 at 4:27 PM, Aleix Pol <aleixpol at kde.org> wrote:
> On Thu, Feb 7, 2013 at 3:30 PM, Andreas Pakulat <apaku at gmx.de> wrote:
>>
>> Hi,
>>
>> just happened to see
>>
>> http://quickgit.kde.org/?p=kdevelop.git&a=commit&h=15bee005db20b380c0b4335d03afb0f17a4e750c
>> on irc and I think the fix is wrong. CMake files do not have an
>> encoding specification, so either cmake dictates an encoding in its
>> documentation (or its own parser) or it does not. In the latter case a
>> user needs to be able to decide which encoding to use for parsing
>> cmake files, just using his local 8 bit encoding can be wrong as well.
>> In particular for utf16 files or for people regularly working with two
>> different encoding (thats the case a lot in asia). It also can already
>> break on windows with utf-8 encoded cmake files, since the local 8 bit
>> encoding there is latin1.
>
> Well, it will be broken either way, because we're assuming latin1 anyway.
> Right now, locally it works the same way at least in my system, so it maps
> better how cmake works.

Actually thats not at all how cmake works as far as I can see. What
exactly didn't work that started to work after this change?

I'm wondering because as far as I see CMake is completely ignoring any
file-encoding, it works solely on the files as bytes. This is also
just fine, since the CMake language only has ascii characters in its
identifiers and tokens. So the only thing where non-ascii characters
can show up is in literal strings or in data read into cmake through
the file-command and similar. In the latter case cmake again doesn't
need to care, applying a regex will be done against the bytes of the
read file content and hence require the cmakelists file and the file
being read have the same encoding (or both using ascii only). And if
literal strings are using non-ascii characters, then the behaviour is
more or less undefined. If they're printed cmake simply prints the
bytes which results in garbage output when run in a terminal that uses
a different encoding. And if they're set to be passed to compiler
commandline you're again passing bytes to that and the result will
again depend on your local locale.

Unfortunately we can't do that, since all our API requires QString
which has a defined encoding and hence input needs to be converted to
it correctly.

So with all that said, I think local8bit is actually the right thing
here, people running a latin1 environment and having utf-8 encoded
cmake files with non-ascii characters should be getting problems with
that on a commandline as well.

Just use the attached (latin1-encoded) CMake file and run cmake on it
in a utf-8 environment.

Andreas
-------------- next part --------------
message(STATUS "Hello Wörld")