On KIO and non-unicode compatible paths

Christoph Feck cfeck at kde.org
Mon Apr 9 22:19:46 UTC 2018


On 08.04.2018 13:59, Inkane wrote:
> I recently had a look at Bug 173097 (Cannot delete a file with "invalid"
> characters in its name), and unfortunately, this seems to be a
> surprisingly difficult issue to fix with how KIO is currently designed.
>[...]
> The root of the issue here is basically the way Qt handles file paths,

Since QFile::setEncodingFunction() no longer works, another way to 
"hack" the conversion is to use QTextCodec::setCodecForLocale() within 
our platform plugin. A specially crafted codec could replace non-UTF8 
bytes with other UTF-16 code words.

 From some minor investigations, we could either use U+DC80...U+DCFF 
(what Python3 uses), or U+EF80...U+EFFF (what MirOS uses). The latter 
code range is also mentioned as "reserved for encoding hacks" in the 
Under-ConScript Unicode Registry http://www.kreativekorp.com/ucsur/

https://docs.python.org/3.3/howto/unicode.html says:
"Files in an Unknown Encoding

What can you do if you need to make a change to a file, but don’t know 
the file’s encoding? If you know the encoding is ASCII-compatible and 
only want to examine or modify the ASCII parts, you can open the file 
with the surrogateescape error handler[...] The surrogateescape error 
handler will decode any non-ASCII bytes as code points in the Unicode 
Private Use Area ranging from U+DC80 to U+DCFF. These private code 
points will then be turned back into the same bytes when the 
surrogateescape error handler is used when encoding the data and writing 
it back out."

I can no longer find the MirOS/MirBSD reference, though.

-- 
Christoph Feck



More information about the Kde-frameworks-devel mailing list