On KIO and non-unicode compatible paths
    Christoph Feck 
    cfeck at kde.org
       
    Mon Apr  9 22:19:46 UTC 2018
    
    
  
On 08.04.2018 13:59, Inkane wrote:
> I recently had a look at Bug 173097 (Cannot delete a file with "invalid"
> characters in its name), and unfortunately, this seems to be a
> surprisingly difficult issue to fix with how KIO is currently designed.
>[...]
> The root of the issue here is basically the way Qt handles file paths,
Since QFile::setEncodingFunction() no longer works, another way to 
"hack" the conversion is to use QTextCodec::setCodecForLocale() within 
our platform plugin. A specially crafted codec could replace non-UTF8 
bytes with other UTF-16 code words.
 From some minor investigations, we could either use U+DC80...U+DCFF 
(what Python3 uses), or U+EF80...U+EFFF (what MirOS uses). The latter 
code range is also mentioned as "reserved for encoding hacks" in the 
Under-ConScript Unicode Registry http://www.kreativekorp.com/ucsur/
https://docs.python.org/3.3/howto/unicode.html says:
"Files in an Unknown Encoding
What can you do if you need to make a change to a file, but don’t know 
the file’s encoding? If you know the encoding is ASCII-compatible and 
only want to examine or modify the ASCII parts, you can open the file 
with the surrogateescape error handler[...] The surrogateescape error 
handler will decode any non-ASCII bytes as code points in the Unicode 
Private Use Area ranging from U+DC80 to U+DCFF. These private code 
points will then be turned back into the same bytes when the 
surrogateescape error handler is used when encoding the data and writing 
it back out."
I can no longer find the MirOS/MirBSD reference, though.
-- 
Christoph Feck
    
    
More information about the Kde-frameworks-devel
mailing list