[Konsole-devel] [Bug 96536] New: Unicode decomposed text gets garbled in Konsole (NFD mode)

Thiago Macieira thiago.macieira at kdemail.net
Fri Jan 7 16:26:48 UTC 2005


------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
        
http://bugs.kde.org/show_bug.cgi?id=96536        
           Summary: Unicode decomposed text gets garbled in Konsole (NFD
                    mode)
           Product: konsole
           Version: 1.5 Beta
          Platform: unspecified
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: NOR
         Component: general
        AssignedTo: konsole-devel kde org
        ReportedBy: thiago.macieira kdemail net


Version:           1.5 Beta (using KDE 3.3.91 (beta1), compiled sources)
Compiler:          gcc version 3.4.3
OS:                Linux (i686) release 2.6.9

There's an open bug report dealing with generic Unicode problems in Konsole: bug #74190. This bug here is about one specific problem.

There are two forms of diacritic characters in Unicode: the composed and the decomposed form. In the first one, there's a single codepoint assigned for a given letter+diacritic. In the second one, the combination is made of two "characters": the base one, without the diacritic, and one combining-diacritic character.

For instance, the LATIN SMALL LETTER A WITH ACUTE (á) is assigned U+00E1. That's the composed, or NFC, form. But it's also possible to generate the same glyph by combining LATIN SMALL LETTER A (a) with COMBINING ACUTE ACCENT: U+0061 U+0301: á (depending on your font, it may show as an "a" with a block above). That's the decomposed, or NFD, form.

The problem is that Konsole turns some of those combinations from NFD to NFC:

$ echo á | od -tx1
0000000 61 cc 81 0a
$ touch á
$ ls
á

Now, if you copy & paste the listed value, here's what happens:
$ echo á | od -tx1
0000000 c3 a1 0a
$ ls á
ls: á: No such file or directory

For other characters, the combining modifier is simply discarded:
$ touch d́
$ ls
á   d́

copy & paste:
$ ls d
ls: d: No such file or directory

Just for fun, let's try adding another combining char: COMBINING ACUTE ACCENT BELOW (U+0317): á̗

(copy & paste from Konsole:)
$ ls
á  d

For comparison:
- Konqueror works fine. No glitches.
- xterm has glitches. NFD d́ doesn't get changed, but NFD á changes to á, whereas NFD á̗ changes into a mixed form (NFC á + combining)



More information about the konsole-devel mailing list