D11552: [WIP] Handle CJK characters

Michael Heidelbach noreply at phabricator.kde.org
Wed Mar 21 17:34:15 UTC 2018


michaelh added a comment.


  In D11552#230870 <https://phabricator.kde.org/D11552#230870>, @alexeymin wrote:
  
  > Regarding this - `I don't know if it is really chinese look foreign enough to me anyway.`
  >  Some lines of text in your test script surely look like Japanese Hiragana to me, especially this one (and tests related to this)
  >
  >   echo "otto东到宛平路anna"> "終末なにしてますか?忙しいですか?救ってもらっていいですか? EP01 太阳の倾いたこの世界で -broken chronograph-.txt"
  >
  
  
  That's the only thing I was sure of (It was in fact an mkv I just watched). At this stage the actual language does not really matter.
  
  > But do your ranges include that characters? This answer on stackoverflow <https://stackoverflow.com/a/30200250/2323699> says that there are also other ranges for Hiragana, Katakana, etc... as @cfeck already said.
  
  My rationale was not to throw in every range mentioned on that wikipedia page, but just enough to make this work and illustrate the general approach.
  
  > Does it pass the test for you?
  
  All except the last two that is '*ですか? EP01' (<mixture of Latin/Hiragana) and 'ですか' (<pure Hiragana). I could lie now and say I left out Hiragana character on purpose. I didn't, but for Hiragana the `one grapheme = one search term` does not apply. So those tests in fact should fail.
  
  @cfeck
  
  > if Baloo doesn't handle CJK, it maybe also doesn't handle other non-Latin scripts, so I suggest to use QChar::category()
  
  I wasn't aware of `QChar::category() `. Thank you.

REPOSITORY
  R293 Baloo

REVISION DETAIL
  https://phabricator.kde.org/D11552

To: michaelh, #baloo, #frameworks, lbeltrame, bruns
Cc: alexeymin, cfeck, ashaposhnikov, michaelh, astippich, spoorun, nicolasfella, ngraham
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20180321/2a70666e/attachment.html>


More information about the Kde-frameworks-devel mailing list