Files with non-utf8 names unaccessible from Qt when using utf8 locale
bastian at kde.org
Thu Jun 5 15:41:56 BST 2003
-----BEGIN PGP SIGNED MESSAGE-----
When using utf8 for filename encoding, Qt is unable to access files that do
not have valid utf8 names. This can happen when a user on a Unix system with
a utf8 locale accesses a CD that uses a different encoding for its files.
The problem is rooted in the following equality:
QFile::encodeName(QFile::decodeName(path)) == path
Qt is able to process files properly as long as this equality holds true. Even
when the actual encoding of a filename does not correspond to the encoding
used by QFile, Qt will, as long as this equality holds, pass the same 8bit
string that it receives from e.g. readdir(3) to sysem functions such as
open(2). In such case the visual representation of the filename will be
incorrect but actions on the file will continue to work as expected.
When the equality does not hold true, Qt will pass a 8bit string to systems
functions such as open(2) that differs from the 8bit string it received from
e.g. readdir(3). Such action is likely to fail: this different 8bit string
will most likely not point to an existing file or worse, point to a different
Not every 8bit string is a valid utf8 sequence, when QFile uses a utf8 codec
it replaces invalid utf8 sequences with QChar::replaced in the QString. When
such QString is then converted back to utf8 again, the resulting 8bit string
is a valid utf8 sequence but no longer identical to the original 8bit string.
I would like to propose that QFile::decodeName/encodeName uses a modified utf8
codec such that the conversion utf8 ->QString -> utf8 always results in the
original 8bit string, even if such string is not a valid utf8 sequence.
I have attached a patch that illustrates how such modified codec could look
like. I have used 0xfffd as escape character, maybe another character such as
0xffff would be more suitable.
I am aware that this very problem can be solved for KDE applications by
providing our own encoding function via QFile::setEncodingFunction but since
this problem will affect Qt-only applications as well, it would be better if
it could be solved in Qt itself.
bastian at kde.org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian at suse.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org
-----END PGP SIGNATURE-----
More information about the kde-core-devel