RFC: Encoding of filenames [long]

Waldo Bastian bastian at kde.org
Thu Jun 5 16:20:38 BST 2003

Hash: SHA1

On Thursday 05 June 2003 14:21, Thiago Macieira wrote:
> Adding to all that, there's the URL problem. URLs are supposed to be 8-bit
> encoded and, as far as the current standards go (from what I can tell),
> UTF-8. I managed to resolve the domain part of the issue -- I hope --, but
> Konqueror still fails the two tests shown in bug #55177. The major problem
> with those is that the encoding is NOT backwards compatible with many sites
> out there that use non-encoded URIs. By being compliant, I'm sure we'll get
> a lot of bug reports that Konqueror doesn't load the right images or go to
> the right sites.

The recommendation has the following note:

"Note. Some older user agents trivially process URIs in HTML using the bytes 
of the character encoding in which the document was received. Some older HTML 
documents rely on this practice and break when transcoded. User agents that 
want to handle these older documents should, on receiving a URI containing 
characters outside the legal set, first use the conversion based on UTF-8. 
Only if the resulting URI does not resolve should they try constructing a URI 
based on the bytes of the character encoding in which the document was 

I think w3c is very well aware that using utf8 only in such case will break a 
zillion sites. I would like to hear about some real world sites that actually 
depend on the utf8 behavior.

- -- 
bastian at kde.org -=|[ SuSE, The Linux Desktop Experts ]|=- bastian at suse.com
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org


More information about the kde-core-devel mailing list