D15068: Bindings: Correct handling of sources containing utf-8

Stefan BrĂ¼ns noreply at phabricator.kde.org
Sat Aug 25 03:36:50 BST 2018


bruns created this revision.
bruns added a reviewer: Frameworks.
Herald added projects: Frameworks, Build System.
Herald added subscribers: kde-buildsystem, kde-frameworks-devel.
bruns requested review of this revision.

REVISION SUMMARY
  Depending on the locale, python3 may try to decode the source as ASCII
  when the file is opened in text mode. This will fail as soon as the
  code contains utf-8, e.g. (c) symbols.
  
  While it is possible to specify the encoding when reading the file,
  this is bad for several reasons:
  
  - only a very small part of the source is processed via _read_source, no need to decode the complete source and store it as string objects
  - the clang Cursor.extent.{start,end}.column refers to bytes, not multibyte characters.
  
  While python2 processes utf-8 containing sources without error messages,
  wrong extent borders are also an issue.
  
  The practical impact is low, as the issue only manifests if there is a
  multibyte character in front of *and* on the same line as the read token.

TEST PLAN
  Python3: Build any bindings which contains sources with non-ASCII codepoints,
  e.g. kcoreaddons. Unpatched version fails when using e.g. LANG=C.
  Python2: Both versions generate sources successfully.

REPOSITORY
  R240 Extra CMake Modules

REVISION DETAIL
  https://phabricator.kde.org/D15068

AFFECTED FILES
  find-modules/sip_generator.py

To: bruns, #frameworks
Cc: kde-frameworks-devel, kde-buildsystem, michaelh, ngraham, bruns
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-buildsystem/attachments/20180825/48d1f267/attachment-0001.html>


More information about the Kde-buildsystem mailing list