QUrl vs KURL - here's some benchmark results for you

Thiago Macieira thiago at kde.org
Thu Jun 2 23:39:27 BST 2005


Thiago Macieira wrote:
>Of course, this doesn't come without problems: local files. A local
> .html page could make a reference to <a href="Résumé.pdf">. IRI would
> mandate that the file referenced be BASE/R%C3%A9sum%C3%A9.pdf, but if
> the user doesn't use UTF-8 for his local filename encodings, this won't
> work.

Thinking of it again, if you were to retrieve the pathname from the URL in 
a QString, the Unicode form would be returned, which in turn would be 
passed to QFile::encodeName to get the correct byte representation.

I.e., Latin1-encoded file: 0xE9, or UTF-8 encoded file: 0xC3 0xA9
URL: %C3%A9
Unicode QString: U+00E9
after QFile::encodeName: 0xE9

This would solve this problem.

Now to the next problem:
What if the URL contained %FF? That's not a valid UTF-8 character in any 
position and cannot be, thus, converted to Unicode (it cannot be 
converted to U+00FF because that's equivalent to %C3%BF).

It could be done using QString's hack/extension to Unicode, that allows 
the encoding of individual arbitrary bytes with UTF-16 surrogate pairs. 
The problem then is that only the UTF-8 encoder/decoder knows about them: 
all the other text codecs will happily turn them all into '?'.

-- 
  Thiago Macieira  -  thiago (AT) macieira (DOT) info
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

2. Tó cennan his weorc gearu, ymbe se circolwyrde, wearð se cægbord and se 
leohtspeccabord, and þa mýs cómon lator. On þone dæg, he hine reste.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20050602/4268d74a/attachment.sig>


More information about the kde-core-devel mailing list