Add stripAccents to KStringHandler

Fri Oct 19 09:44:14 BST 2007

Em Thursday 18 October 2007 19:45:23 Frederik Gladhorn escreveu:
> > > I also think that a more generic mechanism, like stringprep profiles,
> > > are better than "removing accents" for string matching purposes.
> > > Other people already care about locale-aware semantic matching, we
> > > should care about giving it a nice API (to free people from the madness
> > > of ICU etc.) instead of wasting our manpower to reinvent mapping
> > > mechanisms.
> >
> > Stringprep does make sense.
> >
> > Qt implements the IDNA stringprep only (a.k.a. nameprep RFC 3491).

> Can you point me to further material on stringprep? I looked briefly but
> could not find anything really helpful.

RFC 3454 "Preparation of Internationalized Strings" (stringprep)

Abstract:
   This document describes a framework for preparing Unicode text
   strings in order to increase the likelihood that string input and
   string comparison work in ways that make sense for typical users
   throughout the world.  The stringprep protocol is useful for protocol
   identifier values, company and personal names, internationalized
   domain names, and other text strings.

   This document does not specify how protocols should prepare text
   strings.  Protocols must create profiles of stringprep in order to
   fully specify the processing options.

RFC 3491 is basically saying "We choose the following tables in RFC 3454: ... 
And here's why: ..."

> Right now I just want to be able to give feedback of the kind "you have
> made an accent mistake", instead of just saying "you are wrong".
> KHangMan and Kanagram will use it to check if the right letter was entered
> because for the HangMan game for example using all accents and everything
> can be a little too complicated (target audience are children).
> These are all rather limited cases and I hope we won't run into trouble
> with other languages.
> As we all know computer linguistics is not an easy field, so I would be
> happy about further suggestions. I also started looking at some
> spellchecker code.
>
> I agree with Torsten insofar as that after reading the Qt docs I didn't
> realize the unicode decomposition was what I wanted. Maybe some hint could
> be added to the documentation. I realized this should be possible with Qt
> only after reading a lot of the unicode.org stuff (not fun) and then some
> of the ICU docs. Afterwards the Qt docs made more sense, but still tough.

Technically, the NFD and NFKD Unicode forms are what you wanted and the Qt 
documentation provides a way of getting to them. The problem is not the Qt 
documentation, but instead you finding out what you want.

While I do agree that QString & QChar documentation could give a brief glimpse 
of what normalisation means in the Unicode context, it's not it's place to 
explain the whole thing. It's all very technical and much more detailed in 
Unicode's website.

-- 
  Thiago Macieira  -  thiago (AT) macieira.info - thiago (AT) kde.org
    PGP/GPG: 0x6EF45358; fingerprint:
    E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20071019/89d6d3be/attachment.sig>