[PATCHES] Improvements to the International Domain URLs
Thiago Macieira
thiagom at wanadoo.fr
Tue May 27 23:06:22 BST 2003
Hello,
I've been working today in getting IDN domains in URLs working. After a bit of
work, the result is that IDN seems to work now. KURL correctly handles
hostnames outside the basic ASCII character set.
Unlike the previous attempts, I've made KURL instead record the Unicode
hostname in m_strHost. I've therefore removed the KURL::prettyHost()
function, which was introduced after the 3.1 release anyways. The rationale
is that the hostname must be treated as a Unicode string and the appropriate
protocol must decide what encoding to use when transmitting over 8-bit
characters.
(There's KIDNA::toAscii and QResolver::domainToAscii for that purpose)
The only problem left for now is the encoding of the hostnames in error:/
URLs. The sub URL works fine, but the error *text* becomes garbled when
characters outside the user's locale are used. The reason for that is that
when KURL detects that the default encoding will lose information, it opts to
encode the query in UTF-8. That would be ok, except that the decoding no
longer knows it's UTF-8.
Therefore, when trying a fictitious machine like www.multimǽdia.fr, I get the
error message that www.multimǽdia.fr doesn't exist. Note that this doesn't
happen if the user's locale is UTF-8 already.
Summary of changes:
- there is no longer an "encoded" form for hostnames in KURL: they are always
Unicode
[IMPORTANT] Therefore, KURL::host() may return non-ASCII characters. It's up
to the protocols to decide when to encode and how
- when an ACE-encoded hostname is passed to KURL or setHost, it's converted to
its Unicode form
- in Konqueror, typing a hostname will transform it to its normalised form,
including ACE -> Unicode conversion, thanks to the two items above
- resolver functions use Unicode form of the hostnames. No need to encode it
previously with KIDNA::toAscii.
- in kio_http, I've made it use the Unicode form throughout, except for the
few places where the encoded form must be used. Note here that I've made it
so that m_request.encoded_hostname already contains the square brackets
needed for numerical IPv6 addresses.
- in kio_smb, I've removed the remaining calls to prettyHost. If there are any
others out there, they have to be removed.
If the patches are ok, I'd like to see them committed.
--
Thiago Macieira - Registered Linux user #65028
thiagom at mail.com
ICQ UIN: 1967141 PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kio_smb.diff
Type: text/x-diff
Size: 1618 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kio_http.diff
Type: text/x-diff
Size: 11204 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kurl.diff
Type: text/x-diff
Size: 2907 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment-0002.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030528/f94ac1bb/attachment.sig>
More information about the kde-core-devel
mailing list