Migrating Pology to Python 3
Karl Ove Hufthammer
karl at huftis.org
Sat Oct 8 11:21:12 BST 2022
Adrian Chaves skreiv 08.10.2022 12:13:
> I have debugged this issue and I believe the root cause is
> “addFilterHook name="normalize/noinvisible" on="pmsgstr"
> handle="noinvisible"”, defined in puretext.filters, which is included
> in ortography.rules. So I think this is another case where Python 3 is
> working as expected, and Python 2 was not.
Hmm. The ‘noinvisible’ hook should remove only invisible characters.
They are defined in normalize.py:
# As defined by http://www.unicode.org/faq/unsup_char.html.
_invisible_character_codepoints = ([]
+ [0x200C, 0x200D] # cursive joiners
+ list(range(0x202A, 0x202E + 1)) # bidirectional format controls
+ [0x00AD] # soft hyphen
+ [0x2060, 0xFEFF] # word joiners
+ [0x200B] # the zero width space
+ list(range(0x2061, 0x2064 + 1)) # invisible math operators
+ [0x115F, 0x1160] # Jamo filler characters
+ list(range(0xFE00, 0xFE0F + 1)) # variation selectors
)
But the non-breaking space (U+00A0) is not among these characters, and
shouldn’t be removed (or replaced by a normal space).
--
Karl Ove Hufthammer
More information about the kde-i18n-doc
mailing list