Add stripAccents to KStringHandler
Thiago Macieira
thiago at kde.org
Fri Oct 19 09:44:14 BST 2007
Em Thursday 18 October 2007 19:45:23 Frederik Gladhorn escreveu:
> > > I also think that a more generic mechanism, like stringprep profiles,
> > > are better than "removing accents" for string matching purposes.
> > > Other people already care about locale-aware semantic matching, we
> > > should care about giving it a nice API (to free people from the madness
> > > of ICU etc.) instead of wasting our manpower to reinvent mapping
> > > mechanisms.
> >
> > Stringprep does make sense.
> >
> > Qt implements the IDNA stringprep only (a.k.a. nameprep RFC 3491).
> Can you point me to further material on stringprep? I looked briefly but
> could not find anything really helpful.
RFC 3454 "Preparation of Internationalized Strings" (stringprep)
Abstract:
This document describes a framework for preparing Unicode text
strings in order to increase the likelihood that string input and
string comparison work in ways that make sense for typical users
throughout the world. The stringprep protocol is useful for protocol
identifier values, company and personal names, internationalized
domain names, and other text strings.
This document does not specify how protocols should prepare text
strings. Protocols must create profiles of stringprep in order to
fully specify the processing options.
RFC 3491 is basically saying "We choose the following tables in RFC 3454: ...
And here's why: ..."
> Right now I just want to be able to give feedback of the kind "you have
> made an accent mistake", instead of just saying "you are wrong".
> KHangMan and Kanagram will use it to check if the right letter was entered
> because for the HangMan game for example using all accents and everything
> can be a little too complicated (target audience are children).
> These are all rather limited cases and I hope we won't run into trouble
> with other languages.
> As we all know computer linguistics is not an easy field, so I would be
> happy about further suggestions. I also started looking at some
> spellchecker code.
>
> I agree with Torsten insofar as that after reading the Qt docs I didn't
> realize the unicode decomposition was what I wanted. Maybe some hint could
> be added to the documentation. I realized this should be possible with Qt
> only after reading a lot of the unicode.org stuff (not fun) and then some
> of the ICU docs. Afterwards the Qt docs made more sense, but still tough.
Technically, the NFD and NFKD Unicode forms are what you wanted and the Qt
documentation provides a way of getting to them. The problem is not the Qt
documentation, but instead you finding out what you want.
While I do agree that QString & QChar documentation could give a brief glimpse
of what normalisation means in the Unicode context, it's not it's place to
explain the whole thing. It's all very technical and much more detailed in
Unicode's website.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20071019/89d6d3be/attachment.sig>
More information about the kde-core-devel
mailing list