[Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)
Thiago Macieira
thiago at kde.org
Tue Feb 9 15:02:58 GMT 2010
Em Terça-feira 9. Fevereiro 2010, às 10.13.52, Johannes Sixt escreveu:
> Thiago Macieira schrieb:
> > While I agree with you, I have to ask: why?
> >
> > Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
>
> It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
> separately UTF-8-encoded byte sequences. The correct way is to encode
> U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.
>
> http://en.wikipedia.org/wiki/CESU-8
QString correctly encodes UTF-16 surrogate pairs as their UTF-8 sequences,
like you said above.
But that was not the question. The question was whether the surrogate pair
0xD83F 0xDFFF should be considered improper for UTF-8 encoding and dropped?
And the opposite: should the UTF-8 sequence 0xF0 0x9F 0xBF 0xBF be considered
incorrect and dropped?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Senior Product Manager - Nokia, Qt Development Frameworks
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20100209/49f240ed/attachment.sig>
More information about the kde-core-devel
mailing list