[PATCHES] Improvements to the International Domain URLs

Thiago Macieira thiagom at wanadoo.fr
Tue May 27 23:06:22 BST 2003


Hello,

I've been working today in getting IDN domains in URLs working. After a bit of 
work, the result is that IDN seems to work now. KURL correctly handles 
hostnames outside the basic ASCII character set.

Unlike the previous attempts, I've made KURL instead record the Unicode 
hostname in m_strHost. I've therefore removed the KURL::prettyHost() 
function, which was introduced after the 3.1 release anyways. The rationale 
is that the hostname must be treated as a Unicode string and the appropriate 
protocol must decide what encoding to use when transmitting over 8-bit 
characters.

(There's KIDNA::toAscii and QResolver::domainToAscii for that purpose)

The only problem left for now is the encoding of the hostnames in error:/ 
URLs. The sub URL works fine, but the error *text* becomes garbled when 
characters outside the user's locale are used. The reason for that is that 
when KURL detects that the default encoding will lose information, it opts to 
encode the query in UTF-8. That would be ok, except that the decoding no 
longer knows it's UTF-8.

Therefore, when trying a fictitious machine like www.multimǽdia.fr, I get the 
error message that www.multimǽdia.fr doesn't exist. Note that this doesn't 
happen if the user's locale is UTF-8 already.

Summary of changes:
- there is no longer an "encoded" form for hostnames in KURL: they are always 
Unicode
[IMPORTANT] Therefore, KURL::host() may return non-ASCII characters. It's up 
to the protocols to decide when to encode and how

- when an ACE-encoded hostname is passed to KURL or setHost, it's converted to 
its Unicode form

- in Konqueror, typing a hostname will transform it to its normalised form, 
including ACE -> Unicode conversion, thanks to the two items above

- resolver functions use Unicode form of the hostnames. No need to encode it 
previously with KIDNA::toAscii.

- in kio_http, I've made it use the Unicode form throughout, except for the 
few places where the encoded form must be used. Note here that I've made it 
so that m_request.encoded_hostname already contains the square brackets 
needed for numerical IPv6 addresses.

- in kio_smb, I've removed the remaining calls to prettyHost. If there are any 
others out there, they have to be removed.

If the patches are ok, I'd like to see them committed.
-- 
  Thiago Macieira  -  Registered Linux user #65028
   thiagom at mail.com           
    ICQ UIN: 1967141   PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kio_smb.diff
Type: text/x-diff
Size: 1618 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kio_http.diff
Type: text/x-diff
Size: 11204 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kurl.diff
Type: text/x-diff
Size: 2907 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment.sig>


More information about the kde-core-devel mailing list