D21553: add Korean Hangul jamo code point ranges

Mon Jun 3 18:47:56 BST 2019

pshinjo added a comment.

  First of all, thanks for the work. Some things are passing through my mind, especially regarding classical Hangul and half-completed Hangul characters.

  1. U+AC00 .. U+D7AF - no problems in mapping a single QChar to one Hangul character
  2. U+3130 .. U+318F - same as Hangul Syllables block (single QChar to one Hangul character) as characters in this range are non-combining
  3. Hangul Jamo, Hangul Jamo Extended-A, B (U+1100.. U+11FF, U+A960 .. U+A97F, U+D7B0 .. U+D7FF) - here is the tricky part, as what users will see as a single "Hangul character" is not always a single "QChar".

  Let's take an example of '나랏말ᄊᆞ미'. The 'ᄊᆞ' part may be seen as a single character if the rendering font combines U+110A and U+119E. This and other classical Hangul characters can't be "normalized" into a single Unicode code point/QChar, so as half-completed characters (cho+jong, jung+jong). If the underlying font is not combining those two (e.g. the font is not supporting classical Hangul) then users will think that as two separate characters, otherwise one single character. If we can get the font information here then the statistics may follow how the font is rendering these characters (two or one). If not, KS X 1026-1 [1] could be used as a guideline on determining the boundary of a single character.

  Have you checked how other word processors are handling this issue? We can also build some test cases around this too.

  [1] http://www.unicode.org/L2/L2008/08225-n3422.pdf

REPOSITORY
  R8 Calligra

REVISION DETAIL
  https://phabricator.kde.org/D21553

To: daehyuns, Calligra-Devel-list
Cc: jachin, pshinjo, hein, dcaliste, cochise, vandenoever
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/calligra-devel/attachments/20190603/6512e00d/attachment.htm>