Files with non-utf8 names unaccessible from Qt when using utf8 locale

Waldo Bastian bastian at kde.org
Thu Jun 5 15:41:56 BST 2003


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

When using utf8 for filename encoding, Qt is unable to access files that do 
not have valid utf8 names. This can happen when a user on a Unix system with 
a utf8 locale accesses a CD that uses a different encoding for its files.

The problem is rooted in the following equality:
	QFile::encodeName(QFile::decodeName(path)) == path

Qt is able to process files properly as long as this equality holds true. Even 
when the actual encoding of a filename does not correspond to the encoding 
used by QFile, Qt will, as long as this equality holds, pass the same 8bit 
string that it receives from e.g. readdir(3) to sysem functions such as 
open(2). In such case the visual representation of the filename will be 
incorrect but actions on the file will continue to work as expected.

When the equality does not hold true, Qt will pass a 8bit string to systems 
functions such as open(2) that differs from the 8bit string it received from 
e.g. readdir(3). Such action is likely to fail: this different 8bit string 
will most likely not point to an existing file or worse, point to a different 
file.

Not every 8bit string is a valid utf8 sequence, when QFile uses a utf8 codec 
it replaces invalid utf8 sequences with QChar::replaced in the QString. When 
such QString is then converted back to utf8 again, the resulting 8bit string 
is a valid utf8 sequence but no longer identical to the original 8bit string.

I would like to propose that QFile::decodeName/encodeName uses a modified utf8 
codec such that the conversion utf8 ->QString -> utf8 always results in the 
original 8bit string, even if such string is not a valid utf8 sequence.

I have attached a patch that illustrates how such modified codec could look 
like. I have used 0xfffd as escape character, maybe another character such as 
0xffff would be more suitable.

I am aware that this very problem can be solved for KDE applications by 
providing our own encoding function via QFile::setEncodingFunction but since 
this problem will affect Qt-only applications as well, it would be better if 
it could be solved in Qt itself.

Cheers,
Waldo
- -- 
bastian at kde.org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian at suse.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+31a0N4pvrENfboIRAsfbAJwN+MbulYJilSJE+02rBjaNE/DjTACfaesg
QXNBB8Out2vQG5FciuggYaw=
=3pdc
-----END PGP SIGNATURE-----





More information about the kde-core-devel mailing list