D22303: Fix name grouping feature for cyrillic names

Eike Hein noreply at phabricator.kde.org
Thu Jul 18 05:06:58 BST 2019


hein added a comment.


  FYI: I gave this patch a spin with Korean and the grouping is not quite working the right way.
  
  In the Korean alphabet, multiple letters combine into morpho-syllabic blocks. For example, there's the consonant 'ㅋ' (k) and the vowels 'ㅏ' (a) and 'ㅗ' (o). When used together in in a syllable, they're written as '카' and '코' (ka and ko). You'll notice '카' and '코' are single characters (comprised of two letters each). Unicode contains code points for both the individual letters and the pre-composed syllable forms. So those syllables can be encoded in two ways, as pairs of code points (i.e. combining characters) or single code points. This is similar to the way German umlauts work in Unicode, where ä can be encoded either as 'a' + the combining character for diactrics, or as the pre-composed code point. In Unicode these different variants are called normalization forms and there's standardized algorithms for converting between them.
  
  The grouping code in Dolphin currently naively operates on the character level without applying normalization first. Which means that words starting with '카' and '코' are put into two seperate groups, even though their starting letter is actually the same ('ㅋ'). They should be in the same group and collated properly within.
  
  It may not seem that way, but that's actually the same problem that this code is trying to solve by specializing Latin: Normalizing could also be used to drop diacritics and accents from the letters to group them together.
  
  I'd say it's up to the Dolphin maintainership to see if they want this patch as stop-gap or now, but eventually this code will need to be rewritten properly so it does the right thing generically for all scripts. It's really a Unicode support problem, not a Latin or Cyrillic problem.

REPOSITORY
  R318 Dolphin

REVISION DETAIL
  https://phabricator.kde.org/D22303

To: AndreyYashkin, #dolphin, ngraham, cfeck, elvisangelaccio
Cc: hein, cfeck, ngraham, elvisangelaccio, kfm-devel, aprcela, fprice, fbampaloukas, alexde, feverfew, meven, spoorun, navarromorales, firef, andrebarros, emmanuelp, mikesomov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20190718/8475444a/attachment.htm>


More information about the kfm-devel mailing list