Can I display Chinese character filenemes in an
James Richard Tyrer
tyrerj at acm.org
Wed Oct 6 19:54:09 BST 2004
Robin Rosenberg wrote:
> On Monday 04 October 2004 18.35, James Richard Tyrer wrote:
>>Robin Rosenberg wrote:
>>>On Monday 04 October 2004 04.56, James Richard Tyrer wrote:
>>>>Obviously, what I said is not Chinese specific. It applies to any and
>>>>all UTF-8 encoded file names. ISO-8859-1 is a subset of UTF-8 so Latin
>>>>characters will display just the same.
>>>No. ASCII is a subset of UTF-8. ISO-8859-1 and UTF-8 are different and
>>>incompatible (or I'd would be using UTF-8 today).
>>I have: "LANG=en_us.utf8" and I have no problems. IIRC, that is what I
>>have read at authoritative sources. But, do you mean that glyphs 128-255
>>are not the same in ISO-8859-1 and UTF-8? Perhaps there are some problems
>>that I am not aware of since all I ever use (128-255) are Latin letters
>>with diacritical marks. It does appear that odd combinations of characters
>>could be interpreted as something other than ISO-8859-1.
> ISO-8859-1 is both an encoding and a character set while UTF-8 is only and
> encoding for the unicode character set. The code points of these overlap at
> the first 256 posititions. When looked upon as encodings only the first 127
> positions are identical. UTF-8 can encoding all characters in the ISO-8859-1
> character set, but it does it differently. UTF-8 does this with a variable
> length encoding.
> The filename "åäö" can be stored as the byte sequence [e5 e4 f6] when my
> locale is set to ISO-8859-1 or [c3 a5 c3 a4 c3 b6] when using UTF-8. I can't
> have it both ways. The UTF-8 encoding shows up as "Ã¥Ã¤Ã¶" (unreadable
> garbage). In order to swith my locale from ISO-8859-1 to UTF-8 I have to
> convert my filenames as most non-ascii filename would be illegal in UTF-8
> (not that many programs care). The others (non-ascii again) will look wrong.
> Do "ls filenamewithdiacriticalmarks|od -tx1" and you'll see a variable length
> encoding with one or two bytes depending on character (chinese characters are
> even longer). UTF-8 could require up to six bytes for one single character.
> I'm not sure if the unicode consortium has defined any such character yet.
I do note two things:
The first 256 glyphs of Unicode *are* the same as ISO8859-1.
It appears that KDE's clipboard converts to UTF-8 automatically.
This message is from the kde mailing list.
Account management: https://mail.kde.org/mailman/listinfo/kde.
More info: http://www.kde.org/faq.html.
More information about the kde