[Issue N23835] [PATCH] Files with non-utf8 names unaccessible from Qt when using utf8 locale

qt-bugs at trolltech.com qt-bugs at trolltech.com
Fri Jul 4 09:31:48 BST 2003


Hi Waldo,

On Thursday, 05. Jun 2003 17:36 Waldo Bastian wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> [Now with patch]
>
> When using utf8 for filename encoding, Qt is unable to access files
> that do not have valid utf8 names. This can happen when a user on a
> Unix system with a utf8 locale accesses a CD that uses a different
> encoding for its files.
>
> The problem is rooted in the following equality:
> 	QFile::encodeName(QFile::decodeName(path)) == path
>
> Qt is able to process files properly as long as this equality holds
> true. Even when the actual encoding of a filename does not correspond
> to the encoding used by QFile, Qt will, as long as this equality
> holds, pass the same 8bit string that it receives from e.g. readdir(3)
> to sysem functions such as open(2). In such case the visual
> representation of the filename will be incorrect but actions on the
> file will continue to work as expected.
>
> When the equality does not hold true, Qt will pass a 8bit string to
> systems functions such as open(2) that differs from the 8bit string it
> received from e.g. readdir(3). Such action is likely to fail: this
> different 8bit string will most likely not point to an existing file
> or worse, point to a different file.
>
> Not every 8bit string is a valid utf8 sequence, when QFile uses a utf8
> codec it replaces invalid utf8 sequences with QChar::replaced in the
> QString. When such QString is then converted back to utf8 again, the
> resulting 8bit string is a valid utf8 sequence but no longer identical
> to the original 8bit string.
>
> I would like to propose that QFile::decodeName/encodeName uses a
> modified utf8 codec such that the conversion utf8 ->QString -> utf8
> always results in the original 8bit string, even if such string is not
> a valid utf8 sequence.
>
> I have attached a patch that illustrates how such modified codec could
> look like. I have used 0xfffd as escape character, maybe another
> character such as 0xffff would be more suitable.
>
> I am aware that this very problem can be solved for KDE applications
> by providing our own encoding function via QFile::setEncodingFunction
> but since this problem will affect Qt-only applications as well, it
> would be better if it could be solved in Qt itself.

You're right. One needs a solution to the problem you described. However
I don't like using 0xfffd+QChar(ch) for mapping these characters to
Unicode, as the forward and back transformations violate the utf8
encoding a lot.

I've implemented a slightly different solution mapping the characters to
a surrogate pair in the supplementary private use area, as this should
hopefully lead to less conflicts. The only disadvantage is that
currently (until we have a better surrogate handling in Qt) each of
these characters will show up as two boxes instead of one box and the
char mapped from latin1. The diff against qt-3.2 beta2 is attached.

Cheers,
Lars

--
Lars Knoll, Senior Software Engineer
Trolltech AS, Waldemar Thranes gt. 98, N-0175 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-diff
Size: 3316 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030704/fd913dac/attachment.diff>


More information about the kde-core-devel mailing list