[Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)

Johannes Sixt j.sixt at viscovery.net
Tue Feb 9 08:13:52 GMT 2010


Thiago Macieira schrieb:
> Em Segunda-feira 8. Fevereiro 2010, às 21.15.51, Albert Astals Cid escreveu:
>> A Dilluns, 8 de febrer de 2010, Thiago Macieira va escriure:
>>> But QString can handle UTF-16 surrogate pairs and does it just fine. The
>>> sequence 0xD83F 0xDFFF is the U+1FFFF non-character.
>>>
>>> The question is: should those be allowed to exist in a QString? (I think
>>>
>>>  the answer is yes)
>>>
>>> Should QString::toUtf8 and fromUtf8 accept those?
>> From what i understand, they are not valid UTF-8 (just valid UTF-16) so i
>> think the obvious (from the i have no idea of what i'm talking about
>> position) is saying "No".
> 
> While I agree with you, I have to ask: why?
> 
> Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?

It is not valid UTF-8 to write the surrogate pair 0xD83F 0xDFFF as two
separately UTF-8-encoded byte sequences. The correct way is to encode
U+1FFFF as a single UTF-8-encoded byte sequence 0xF0 0x9F 0xBF 0xBF.

http://en.wikipedia.org/wiki/CESU-8

-- Hannes




More information about the kde-core-devel mailing list