[Kde-pim] Fwd: Re: KDE 4.4.98 (4.4 RC3)

argonel argonel at gmail.com
Mon Feb 8 23:43:12 GMT 2010


On Mon, Feb 8, 2010 at 5:10 PM, Thiago Macieira <thiago at kde.org> wrote:

> Em Segunda-feira 8. Fevereiro 2010, às 21.15.51, Albert Astals Cid
> escreveu:
> > A Dilluns, 8 de febrer de 2010, Thiago Macieira va escriure:
> > > Em Domingo 7. Fevereiro 2010, às 16.33.34, argonel escreveu:
> > > > On Sun, Feb 7, 2010 at 3:58 AM, Thiago Macieira <thiago at kde.org>
> wrote:
> > > > > The protection has to happen somewhere. Technically, it's
> > > > > Konversation's fault
> > > > > for passing unfiltered network data into an API.
> > > > >
> > > > > But it could also be a QString issue, for allowing those invalid
> > > > > UTF-8 strings
> > > > > to be converted to UTF-16 in the first place.
> > > > >
> > > > > Note that changing the D-Bus behaviour may likely introduce bugs in
> > > > > Glib-based
> > > > > applications, where conversions from UTF-8 do implement this check.
> > > > > (Which, in
> > > > > my opinion, is incomplete)
> > > >
> > > > If you're referring to dbus's lack of checks for 0x1FFFE and so on, I
> > > > found that I was unable to create a QChar > 0xFFFF, so perhaps not
> > > > checking those is reasonable.
> > >
> > > Of course you can't create a QChar > 0xFFFF.
> > >
> > > But QString can handle UTF-16 surrogate pairs and does it just fine.
> The
> > > sequence 0xD83F 0xDFFF is the U+1FFFF non-character.
> > >
> > > The question is: should those be allowed to exist in a QString? (I
> think
> > >
> > >  the answer is yes)
> > >
> > > Should QString::toUtf8 and fromUtf8 accept those?
> >
> > From what i understand, they are not valid UTF-8 (just valid UTF-16) so i
> > think the obvious (from the i have no idea of what i'm talking about
> > position) is saying "No".
>
> While I agree with you, I have to ask: why?
>
> Why are they valid UTF-16 and valid UCS-4 but not valid UTF-8?
>
>
RFC 3629 section 3 says:

 "The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16
encoding form (as surrogate pairs) and do not directly represent
characters. When encoding in UTF-8 from UTF-16 data, it is necessary
to first decode the UTF-16 data to obtain character numbers which
are then encoded in UTF-8 as described above."

If QString uses UTF-16 internally, QString::toUtf8 should be converting the
surrogate pairs to valid Utf8, and QString::fromUtf8 should not accept them
and provide some kind of error feedback.


> --
> Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
>  Senior Product Manager - Nokia, Qt Development Frameworks
>      PGP/GPG: 0x6EF45358; fingerprint:
>      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20100208/a4c9942b/attachment.htm>


More information about the kde-core-devel mailing list