[Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)

Thiago Macieira thiago at kde.org
Tue Feb 9 15:02:58 GMT 2010


Em Terça-feira 9. Fevereiro 2010, às 10.13.52, Johannes Sixt escreveu:
> Thiago Macieira schrieb:
> > While I agree with you, I have to ask: why?
> > 
> > Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
> 
> It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
> separately UTF-8-encoded byte sequences. The correct way is to encode
> U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.
> 
> http://en.wikipedia.org/wiki/CESU-8

QString correctly encodes UTF-16 surrogate pairs as their UTF-8 sequences, 
like you said above.

But that was not the question. The question was whether the surrogate pair 
0xD83F 0xDFFF should be considered improper for UTF-8 encoding and dropped? 
And the opposite: should the UTF-8 sequence 0xF0 0x9F 0xBF 0xBF be considered 
incorrect and dropped?

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
  Senior Product Manager - Nokia, Qt Development Frameworks
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20100209/49f240ed/attachment.sig>


More information about the kde-core-devel mailing list