[konsole] [Bug 471483] New: Problems with C1 control codes (U+0080 through U+009F)

Frank Heckenbach bugzilla_noreply at kde.org
Mon Jun 26 22:20:47 BST 2023


https://bugs.kde.org/show_bug.cgi?id=471483

            Bug ID: 471483
           Summary: Problems with C1 control codes (U+0080 through U+009F)
    Classification: Applications
           Product: konsole
           Version: 22.12.3
          Platform: Debian stable
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: emulation
          Assignee: konsole-devel at kde.org
          Reporter: f.heckenbach at fh-soft.de
  Target Milestone: ---

Konsole recently (apparently between versions 20 and 22) added support for
8-bit C1 control codes (U+0080 through U+009F). While formally correct, in
practice it seems to cause more problems than benefits:

On the one hand, I don't know any application that actually outputs these
characters. Wikipedia (https://en.wikipedia.org/wiki/C0_and_C1_control_codes)
seems to agree: "the 8-bit forms of these codes are almost never used. CSI, DCS
and OSC are used to control text terminals and terminal emulators, but almost
always by using their 7-bit escape code representations."

On the other hand, they can actively cause problems (which contributed to their
not being used much). In previous times, there were issues in not 8-bit-clean
environments; these days rather with UTF-8. To quote Wikipedia again, "the
UTF-8 encodings of their corresponding codepoints are two bytes long like their
escape code forms (for instance, CSI at U+009B is encoded as the bytes 0xC2,
0x9B in UTF-8), so there is no advantage to using them rather than the
equivalent two-byte escape sequence. When these codes appear in modern
documents, web pages, e-mail messages, etc., they are usually intended to be
printing characters at that position in a proprietary encoding such as
Windows-1252 or Mac OS Roman that use the C1 codes to provide additional
graphic characters."
... or, I'd like to add, mojibake. E.g. the German letter "ß" is U+00DF with
UTF-8 encoding 0xC3 0x9F. I had a long-running program (with UTF-8 output) in a
Konsole window set to ISO-8859-1 accidentally, and from the first occurrence of
that letter, Konsole waited for the end of the supposedly APC sequence which
never came, so it swallowed all further output including probably some
important messages from the program. Sure, mojibake is not nice in general, but
for languages with few non-ASCII characters such as German, quite tolerable.
Swallowing all output makes matters much worse.

So I'd suggest to add at least an option to disable their handling.

STEPS TO REPRODUCE
1. Set encoding to ISO-8859-1 in Konsole window
2. Run in that window (this should be independent of shells and locale
settings, though UTF-8 locale must be installed):
LC_ALL=C.UTF-8 /usr/bin/printf 'Gro\u00df\n'; echo Good

OBSERVED RESULT
GroÃ

(Output cut off and window "dead", or possibly revived by control characters in
shell prompt.)

EXPECTED RESULT
GroÃ?
Good
%

(Mojibake in first line, but second line correct.)

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the konsole-devel mailing list