Migrating Pology to Python 3

Karl Ove Hufthammer karl at huftis.org
Sat Oct 8 11:21:12 BST 2022


Adrian Chaves skreiv 08.10.2022 12:13:
> I have debugged this issue and I believe the root cause is 
> “addFilterHook name="normalize/noinvisible" on="pmsgstr" 
> handle="noinvisible"”, defined in puretext.filters, which is included 
> in ortography.rules. So I think this is another case where Python 3 is 
> working as expected, and Python 2 was not.

Hmm. The ‘noinvisible’ hook should remove only invisible characters. 
They are defined in normalize.py:

# As defined by http://www.unicode.org/faq/unsup_char.html.
_invisible_character_codepoints = ([]
     + [0x200C, 0x200D] # cursive joiners
     + list(range(0x202A, 0x202E + 1)) # bidirectional format controls
     + [0x00AD] # soft hyphen
     + [0x2060, 0xFEFF] # word joiners
     + [0x200B] # the zero width space
     + list(range(0x2061, 0x2064 + 1)) # invisible math operators
     + [0x115F, 0x1160] # Jamo filler characters
     + list(range(0xFE00, 0xFE0F + 1)) # variation selectors
)

But the non-breaking space (U+00A0) is not among these characters, and 
shouldn’t be removed (or replaced by a normal space).


-- 
Karl Ove Hufthammer



More information about the kde-i18n-doc mailing list