[Konsole-devel] [Bug 96536] New: Unicode decomposed text gets garbled in Konsole (NFD mode)
Thiago Macieira
thiago.macieira at kdemail.net
Fri Jan 7 16:26:48 UTC 2005
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
http://bugs.kde.org/show_bug.cgi?id=96536
Summary: Unicode decomposed text gets garbled in Konsole (NFD
mode)
Product: konsole
Version: 1.5 Beta
Platform: unspecified
OS/Version: Linux
Status: NEW
Severity: normal
Priority: NOR
Component: general
AssignedTo: konsole-devel kde org
ReportedBy: thiago.macieira kdemail net
Version: 1.5 Beta (using KDE 3.3.91 (beta1), compiled sources)
Compiler: gcc version 3.4.3
OS: Linux (i686) release 2.6.9
There's an open bug report dealing with generic Unicode problems in Konsole: bug #74190. This bug here is about one specific problem.
There are two forms of diacritic characters in Unicode: the composed and the decomposed form. In the first one, there's a single codepoint assigned for a given letter+diacritic. In the second one, the combination is made of two "characters": the base one, without the diacritic, and one combining-diacritic character.
For instance, the LATIN SMALL LETTER A WITH ACUTE (á) is assigned U+00E1. That's the composed, or NFC, form. But it's also possible to generate the same glyph by combining LATIN SMALL LETTER A (a) with COMBINING ACUTE ACCENT: U+0061 U+0301: á (depending on your font, it may show as an "a" with a block above). That's the decomposed, or NFD, form.
The problem is that Konsole turns some of those combinations from NFD to NFC:
$ echo á | od -tx1
0000000 61 cc 81 0a
$ touch á
$ ls
á
Now, if you copy & paste the listed value, here's what happens:
$ echo á | od -tx1
0000000 c3 a1 0a
$ ls á
ls: á: No such file or directory
For other characters, the combining modifier is simply discarded:
$ touch d́
$ ls
á d́
copy & paste:
$ ls d
ls: d: No such file or directory
Just for fun, let's try adding another combining char: COMBINING ACUTE ACCENT BELOW (U+0317): á̗
(copy & paste from Konsole:)
$ ls
á d
For comparison:
- Konqueror works fine. No glitches.
- xterm has glitches. NFD d́ doesn't get changed, but NFD á changes to á, whereas NFD á̗ changes into a mixed form (NFC á + combining)
More information about the konsole-devel
mailing list