KURL problem
Thiago Macieira
thiago at kde.org
Tue Sep 27 18:18:00 BST 2005
David Faure wrote:
>However the real question is: do you remember why you asked for whether
> "%E1.foo" was a valid URL? It is parsed by the current KURL, but Thiago
> tells me that this url doesn't have a defined interpretation.
I guess we were trying to test á.foo, but you changed it to %E1.foo to
avoid encoding issues.
Let me explain why %E1 is not valid in the hostname part: URLs are
supposed to be UTF-8 binary byte sequences. That means non-ASCII bytes
are supposed to be converted into characters when they form a valid UTF-8
sequence (e.g., %C3%A1), but invalid sequences are not supposed to be
discarded (e.g. %E1).
However, hostnames are Unicode strings, so you can't have invalid high-bit
sequences in UTF-8 representation (and they are impossible in UTF-16
representation). /usr/bin/idn behaves the same way:
(run on UTF-8 environment)
$ echo á.foo | iconv -t latin1 | idn -a --quiet
idn: idna_to_ascii_4z: String preparation failed
BTW, Qt has for some time allowed those non-UTF8 sequences to be converted
into QStrings and back, by using some reserved characters in UTF-16.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
2. Tó cennan his weorc gearu, ymbe se circolwyrde, wearð se cægbord and se
leohtspeccabord, and þa mýs cómon lator. On þone dæg, he hine reste.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20050927/2e52838b/attachment.sig>
More information about the kde-core-devel
mailing list