Migrating Pology to Python 3

Adrian Chaves adrian at chaves.io
Sat Oct 8 12:23:10 BST 2022


It turned out the markup filter was the one. It is supposed to convert 
the HTML tags in messages to plain text, and in doing so it normalized 
spaces, replacing all series of whitespace characters with a single, 
regular space. In the switch to Python 3, that started including 
non-breaking spaces.

I think space normalization is not something this filter should do at 
all in parts not affected by HTML tags, but that may need a discussion. 
For now, I have restored the Python 2 behavior of only normalizing ASCII 
spaces: 
https://invent.kde.org/sdk/pology/commit/0da46cdb3802c03b0930aeb70f781256d9fe6e69

On 2022-10-08 12:21, Karl Ove Hufthammer wrote:

> Adrian Chaves skreiv 08.10.2022 12:13:
> 
>> I have debugged this issue and I believe the root cause is 
>> "addFilterHook name="normalize/noinvisible" on="pmsgstr" 
>> handle="noinvisible"", defined in puretext.filters, which is included 
>> in ortography.rules. So I think this is another case where Python 3 is 
>> working as expected, and Python 2 was not.
> 
> Hmm. The 'noinvisible' hook should remove only invisible characters. 
> They are defined in normalize.py:
> 
> # As defined by http://www.unicode.org/faq/unsup_char.html.
> _invisible_character_codepoints = ([]
> + [0x200C, 0x200D] # cursive joiners
> + list(range(0x202A, 0x202E + 1)) # bidirectional format controls
> + [0x00AD] # soft hyphen
> + [0x2060, 0xFEFF] # word joiners
> + [0x200B] # the zero width space
> + list(range(0x2061, 0x2064 + 1)) # invisible math operators
> + [0x115F, 0x1160] # Jamo filler characters
> + list(range(0xFE00, 0xFE0F + 1)) # variation selectors
> )
> 
> But the non-breaking space (U+00A0) is not among these characters, and 
> shouldn't be removed (or replaced by a normal space).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20221008/299929dd/attachment.htm>


More information about the kde-i18n-doc mailing list