On KIO and non-unicode compatible paths
Christoph Feck
cfeck at kde.org
Mon Apr 9 22:19:46 UTC 2018
On 08.04.2018 13:59, Inkane wrote:
> I recently had a look at Bug 173097 (Cannot delete a file with "invalid"
> characters in its name), and unfortunately, this seems to be a
> surprisingly difficult issue to fix with how KIO is currently designed.
>[...]
> The root of the issue here is basically the way Qt handles file paths,
Since QFile::setEncodingFunction() no longer works, another way to
"hack" the conversion is to use QTextCodec::setCodecForLocale() within
our platform plugin. A specially crafted codec could replace non-UTF8
bytes with other UTF-16 code words.
From some minor investigations, we could either use U+DC80...U+DCFF
(what Python3 uses), or U+EF80...U+EFFF (what MirOS uses). The latter
code range is also mentioned as "reserved for encoding hacks" in the
Under-ConScript Unicode Registry http://www.kreativekorp.com/ucsur/
https://docs.python.org/3.3/howto/unicode.html says:
"Files in an Unknown Encoding
What can you do if you need to make a change to a file, but don’t know
the file’s encoding? If you know the encoding is ASCII-compatible and
only want to examine or modify the ASCII parts, you can open the file
with the surrogateescape error handler[...] The surrogateescape error
handler will decode any non-ASCII bytes as code points in the Unicode
Private Use Area ranging from U+DC80 to U+DCFF. These private code
points will then be turned back into the same bytes when the
surrogateescape error handler is used when encoding the data and writing
it back out."
I can no longer find the MirOS/MirBSD reference, though.
--
Christoph Feck
More information about the Kde-frameworks-devel
mailing list