Character sets / encoding

Thu Sep 10 07:54:25 BST 2009

Hi Anne,

On 2009-09-10 14:26, Anne Wilson wrote:
> On Wednesday 09 September 2009 21:35:05 James Tyrer wrote:
[...]
>> The >=128 glyphs which I commonly user are: äëïöüñ.  Since I am sending
>> this email in ISO 8859-1, these characters will not appear correctly if
>> viewed with UTF-8.
>>
>> I have found that the only solution to this problem is to set the code
>> page for incoming mail to either ISO 8859-1 or IBM cp 1252.
> 
> Not sure  what's happening James.  If the characters you typed were umlauted, 
> as they seem to be, then they are reading correctly on this netbook (I'll 
> check on another machine later).  Here KMail is set to use the following
> 
> utf-8
> utf-8 (locale)
> us-ascii
> iso-8859-1
> 
> Now whether that means that if one doesn't fit it falls back to the next one, I 
> don't know.  What do you think?

The real problem with charsets and encodings is, that you always have to tell
the interpreting program (Browser, Mail/News reader, ... whichever program
wants to show the bits from the net in a readable form) which Charset (and
encoding) has actually been used to encode the message, so that it can choose
the matching decoder.

If this information is not given, there is no other way than guessing. And
everybody knows that computers are not good at that. How would a computer know
how the string 'äëïöüñ' from James should actually look like, if he hadn't had
specified the encoding in the header (open the source code of his mail, and you
will see the following line: Content-Type: text/plain; charset="iso-8859-1").
The computer could then (for example) have guessed that those bits were
supposed to mean "潆秭" ("eddy billion" in Chinese)... Ok, I admit, I cheated a
bit on this one - it wouldn't have been a valid bit sequence for a GBK decoder,
which any sane guessing algorithm would have detected... but still, I think you
get the point.

So, people, use Unicode (the "universal charset") encoded as UTF-8 for
everything - and maybe in a few years we can all forget about all this
charset/encoding mess :)

Patrick.

P.S.: I used Unicode/UTF-8 in this mail (and of course it's specified in the
mail's header), otherwise it wouldn't even have been possible to put both
Chinese characters and umlauts in one mail.

-- 
Key ID: 0x86E346D4            http://patrick-nagel.net/key.asc
Fingerprint: 7745 E1BE FA8B FBAD 76AB 2BFC C981 E686 86E3 46D4

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.kde.org/pipermail/kde/attachments/20090910/ed16ff24/attachment.sig>
-------------- next part --------------
___________________________________________________
This message is from the kde mailing list.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.