[konsole] [Bug 395171] Remove UTF-16 and other non ASCII compatible encodings

Egmont Koblinger bugzilla_noreply at kde.org
Fri Apr 30 22:49:45 BST 2021


https://bugs.kde.org/show_bug.cgi?id=395171

--- Comment #8 from Egmont Koblinger <egmont at gmail.com> ---
(In reply to Jayadevan from comment #7)

I stopped working on terminal emulation about a year ago. Yet, I'm making a
single exception here to respond (i.e. I most likely won't follow up, don't
bother writing in order to expect a response from me).


> Please reject such proposals, as those are discriminatory.

I firmly refute this claim.

There is nothing discriminatory in the proposal whatsoever.

The reason behind this request – and this should be obvious to everyone who
takes time to really _understand_ the post and the linked article – is that
UTF-16 (and a few friends) as the _I/O_ encoding *does not work*, *never
worked* and even more importantly, *cannot be fixed to work*.

More precisely, you can write a terminal emulator that speaks this encoding,
but when placed in its context (i.e. surrounded by a Unix kernel, libc, higher
level libraries, tools, apps, tmux-likes, other computers to ssh to/from, etc.)
it won't do anything that makes sense, since all the surrounding infrastructure
only support ASCII-compatible encodings for the communication with the
terminal.

In order to support UTF-16 as the _I/O_ encoding, in a way that you actually
get a working ecosystem around the terminal with this encoding, you'd need
modifications to the kernel's tty handling (line discipline, stty special
characters etc.), the kernel's tty-accessing API (to enforce UTF-16, or at
least an even number of bytes on all opertaions that write to / read from a
tty, or work with 16-bit units instead of 8-bit ones, in order to exclude the
possibility of going out of sync, causing permanent breakages), accompanied
with the corresponding changes in standards (e.g. POSIX), you'd need these
changes in libc too, you'd need heavy modifications in all the apps (e.g.
change from '\0'-terminated byte strings to wide strings or whatnot); you'd
need to throw out any shell script that contains even an "echo foo" (in an
ASCII-compatible encoding) beacuse that would outright break the terminal if
sent out as-is, you'd need to rethink "cat" (how to transfer potentially odd
number of bytes into a channel that expects even numbers), you'd need to add
UTF-16 locales, and so on and so forth... I just sketched up a tiny subset of
the problems. You'd need to essentially rethink and adjust all the APIs,
libraries, every single tool or application inside the terminal, literally
everything. All these in order to create a system that's utterly incompatible
with what we already have, and regarding the user-visible outcome is not any
tad bit better. It's clearly not going to happen, and even if happened, would
be clearly harmful.

There is no politics or discrimination at all here, this is purely technical.


> UTF-8 is Anglo-centric. UTF-16 treats each writing system more fairly.

UTF-8 can represent the exact same things as UTF-16. They support all writing
systems to the very same extent.

The only sense in which one can perhaps claim that UTF-8 is Anglo-centric, is
that it uses 1 byte for English letters vs. 3 bytes for CJK (Chinese, Japanese,
Korean) symbols; whereas UTF-16 uses 2 for both. Given that an English letter
represents, well, a single letter of a word, whereas a CJK symbol represents a
syllable or an entire word, I actually do think UTF-8's 1:3 split is a way more
fair system. (Let alone that the typical work happening inside a terminal is
usually English-centric.)

By the way: who cares? With today's network speeds, combined with the tiny
amount of terminal data compared to any other activity you do over any network,
the difference in the byte count just simply does not matter at all.


> Since KDE Internally uses UTF-16, UTF-16 should be supported.

Trying to make a connection between the _internal_ encoding and the _I/O_
encoding is not justified at all.

As an occasional user of Konsole I don't have the slightest idea what encoding
it uses _internally_, and it should be this way. Users shouldn't care, users
shouldn't need to care. If users needed to care, it would mean that the
developers did a terrible job. The internal encoding is subject to change by
the developers at any time, without any user noticing it.

What _I/O_ encodings Konsole supports (or, in this case: incorrectly claims to
support) is an utterly independent story.


> Also, UTF-16 is used by KDE, QT, C/C++ (From ICU), Java, Windows,
> JavaScript, Android, DartVM, Dart Language, and modern frameworks
> like Flutter.

You see: they made a choice. They don't offer alternatives, they decided on one
encoding.

The same goes for terminals. They decided on UTF-8; unsurprisingly, since for
millions of technical reasons, the encoding needs to be ASCII-compatible,
whereas there's a natural need to encode any text.

Many modern terminal emulators only support UTF-8 encoding and nothing else.
Many other terminal emulators support some legacy deprecated ones for backwards
compatibility, back from the days when the world hadn't settled on UTF-8, but
those at least work. And then there's Konsole offering some choices that never
worked, don't work, will never work due to millions of technical issues.

The direction is not to offer alternatives senselessly. Especially not if such
an alternative would require to redesign and rewrite pretty much every single
component of the ecosystem. The direction is one single mode of operation that
is perfect for everybody. As for the terminals' _I/O_ encoding, this is UTF-8.

No culture, no language, no writing system, no human being was discriminated by
this choice. The current UTF-8 approach supports everything that the UTF-16
approach, if was reasonable and feasible to implement – which it is not –,
could support.

Choosing one technical solution over the other – even if the other was viable
too, which is not the case here – is not discrimination. It is proper
engineering.

The current bugreport is about the removal of a claimed feature that doesn't
work, never worked, and cannot be made to work.

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the konsole-devel mailing list