Encoding issues in cmake files

Aleix Pol aleixpol at kde.org
Thu Feb 7 22:42:14 UTC 2013


On Thu, Feb 7, 2013 at 6:05 PM, Andreas Pakulat <apaku at gmx.de> wrote:

> Hi,
>
> On Thu, Feb 7, 2013 at 4:27 PM, Aleix Pol <aleixpol at kde.org> wrote:
> > On Thu, Feb 7, 2013 at 3:30 PM, Andreas Pakulat <apaku at gmx.de> wrote:
> >>
> >> Hi,
> >>
> >> just happened to see
> >>
> >>
> http://quickgit.kde.org/?p=kdevelop.git&a=commit&h=15bee005db20b380c0b4335d03afb0f17a4e750c
> >> on irc and I think the fix is wrong. CMake files do not have an
> >> encoding specification, so either cmake dictates an encoding in its
> >> documentation (or its own parser) or it does not. In the latter case a
> >> user needs to be able to decide which encoding to use for parsing
> >> cmake files, just using his local 8 bit encoding can be wrong as well.
> >> In particular for utf16 files or for people regularly working with two
> >> different encoding (thats the case a lot in asia). It also can already
> >> break on windows with utf-8 encoded cmake files, since the local 8 bit
> >> encoding there is latin1.
> >
> > Well, it will be broken either way, because we're assuming latin1 anyway.
> > Right now, locally it works the same way at least in my system, so it
> maps
> > better how cmake works.
>
> Actually thats not at all how cmake works as far as I can see. What
> exactly didn't work that started to work after this change?
>
> I'm wondering because as far as I see CMake is completely ignoring any
> file-encoding, it works solely on the files as bytes. This is also
> just fine, since the CMake language only has ascii characters in its
> identifiers and tokens. So the only thing where non-ascii characters
> can show up is in literal strings or in data read into cmake through
> the file-command and similar. In the latter case cmake again doesn't
> need to care, applying a regex will be done against the bytes of the
> read file content and hence require the cmakelists file and the file
> being read have the same encoding (or both using ascii only). And if
> literal strings are using non-ascii characters, then the behaviour is
> more or less undefined. If they're printed cmake simply prints the
> bytes which results in garbage output when run in a terminal that uses
> a different encoding. And if they're set to be passed to compiler
> commandline you're again passing bytes to that and the result will
> again depend on your local locale.
>
> Unfortunately we can't do that, since all our API requires QString
> which has a defined encoding and hence input needs to be converted to
> it correctly.
>
> So with all that said, I think local8bit is actually the right thing
> here, people running a latin1 environment and having utf-8 encoded
> cmake files with non-ascii characters should be getting problems with
> that on a commandline as well.
>
> Just use the attached (latin1-encoded) CMake file and run cmake on it
> in a utf-8 environment.
>
> Andreas
>

What works now is that:
add_subdirectory(blàh)

will match the actual directory now, in a UTF-8 system, producing a valid
project tree.

Aleix
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kdevelop-devel/attachments/20130207/f67e40c9/attachment.html>


More information about the KDevelop-devel mailing list