Can I display Chinese character filenemes in an

Wed Oct 6 19:54:09 BST 2004

Robin Rosenberg wrote:
> On Monday 04 October 2004 18.35, James Richard Tyrer wrote:
> 
>>Robin Rosenberg wrote:
>>
>>>On Monday 04 October 2004 04.56, James Richard Tyrer wrote:
>>>
>>>>Obviously, what I said is not Chinese specific.  It applies to any and
>>>>all UTF-8 encoded file names.  ISO-8859-1 is a subset of UTF-8 so Latin
>>>>characters will display just the same.
>>>
>>>No. ASCII is a subset of UTF-8.  ISO-8859-1 and UTF-8 are different and
>>>incompatible (or I'd would be using UTF-8 today).
>>
>>I have: "LANG=en_us.utf8" and I have no problems.  IIRC, that is what I
>>have read at authoritative sources.  But, do you mean that glyphs 128-255
>>are not the same in ISO-8859-1 and UTF-8?  Perhaps there are some problems
>>that I am not aware of since all I ever use (128-255) are Latin letters
>>with diacritical marks.  It does appear that odd combinations of characters
>>could be interpreted as something other than ISO-8859-1.
> 
> 
> ISO-8859-1 is both an encoding and a character set while UTF-8 is only and 
> encoding for the unicode character set. The code points of these overlap at 
> the first 256 posititions.  When looked upon as encodings only the first 127 
> positions are identical. UTF-8 can encoding all characters in the ISO-8859-1 
> character set, but it does it differently. UTF-8 does this with a variable 
> length encoding.
> 
> The filename "åäö" can be stored as the byte sequence [e5 e4 f6] when my 
> locale is set to ISO-8859-1 or [c3 a5 c3 a4 c3 b6] when using UTF-8. I can't
> have it both ways. The UTF-8 encoding shows up as "Ã¥Ã¤Ã¶" (unreadable 
> garbage). In order to swith my locale from ISO-8859-1 to UTF-8 I have to 
> convert my filenames as most non-ascii filename would be illegal in UTF-8 
> (not that many programs care). The others (non-ascii again) will look wrong.
> 
> Do "ls filenamewithdiacriticalmarks|od -tx1" and you'll see a variable length
> encoding with one or two bytes depending on character (chinese characters are 
> even longer). UTF-8 could require up to six bytes for one single character. 
> I'm not sure if the unicode consortium has defined any such character yet.

I do note two things:

The first 256 glyphs of Unicode *are* the same as ISO8859-1.

It appears that KDE's clipboard converts to UTF-8 automatically.

--
JRT
___________________________________________________
This message is from the kde mailing list.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.