Fixing and regulating certain types of search fields across KF5 apps

Eike Hein hein at kde.org
Tue Feb 10 13:13:42 UTC 2015



On 02/10/2015 01:01 PM, Aleix Pol wrote:
> I like the idea.
> Have you checked whether ICU provides something like this? They might...

To approach this more broadly: The basic problem here is that
not every character code point in Unicode stands for a single
phoneme; in the examples I mentioned a character can be
syllable or more. This makes a check on the number of characters
a poor check for information content, since less than three
characters can easily pack enough phonetic information to make
distinct words (consider e.g. tonal languages that exploit
dimensions of audio for encoding semantic value English does
not, too).

I figured we're not the first ones to hit this problem, so I
did some basic research on whether the character database has
enough metadata for a scoring algorithm and whether there's a
well-established scoring algorithm around (including looking
at ICU). So far I haven't found anything beyond the basic
idea of exploiting the character classification to assign
average phoneme counts -- but the good news is that as soon
as we centralize this into one implementation, we're free to
improve it later on (which I would expect to do as I hang out
more in this problem space in the future).

This is also why I want to keep the API very minimal for now,
either this:

bool isMinimalSearchableLength(QString)

or at most:

bool isMinimalSearchableLength(QString, int approximateLength = 3)

Naming subject to improvements of course, but approximate-
Length here would be like "think of it like a phoneme",
which is what the implementation would approximate scoring
for, allowing the dev to override the target. I'm not sure
I even want to expose that second parameter though since it
constrains the behavior of the impl. This is also something
I'd really like feedback for though: As a dev, how would you
want to use it?


> Aleix

Cheers,
Eike


More information about the Kde-frameworks-devel mailing list