KURL problem

Thiago Macieira thiago at kde.org
Tue Sep 27 18:18:00 BST 2005


David Faure wrote:
>However the real question is: do you remember why you asked for whether
> "%E1.foo" was a valid URL? It is parsed by the current KURL, but Thiago
> tells me that this url doesn't have a defined interpretation.

I guess we were trying to test á.foo, but you changed it to %E1.foo to 
avoid encoding issues.

Let me explain why %E1 is not valid in the hostname part: URLs are 
supposed to be UTF-8 binary byte sequences. That means non-ASCII bytes 
are supposed to be converted into characters when they form a valid UTF-8 
sequence (e.g., %C3%A1), but invalid sequences are not supposed to be 
discarded (e.g. %E1).

However, hostnames are Unicode strings, so you can't have invalid high-bit 
sequences in UTF-8 representation (and they are impossible in UTF-16 
representation). /usr/bin/idn behaves the same way:

(run on UTF-8 environment)
$ echo á.foo | iconv -t latin1 | idn -a --quiet
idn: idna_to_ascii_4z: String preparation failed

BTW, Qt has for some time allowed those non-UTF8 sequences to be converted 
into QStrings and back, by using some reserved characters in UTF-16.

-- 
  Thiago Macieira  -  thiago (AT) macieira.info - thiago (AT) kde.org
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

2. Tó cennan his weorc gearu, ymbe se circolwyrde, wearð se cægbord and se 
leohtspeccabord, and þa mýs cómon lator. On þone dæg, he hine reste.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20050927/2e52838b/attachment.sig>


More information about the kde-core-devel mailing list