Character sets / encoding

Tue Sep 8 13:30:04 BST 2009

On Tuesday 08 September 2009 08:48:01 Peter Lewis wrote:
> On Monday 07 Sep 2009 Peter Lewis sent:
> > On Sunday 06 Sep 2009 Anne Wilson sent:
> > > In KMail I have problems with accented characters, resulting in this
> > > like L�ck.  I assume this is a problem of character encoding.  There
> > > doesn't seem to be anywhere in systemsettings that I can check and
> > > possibly alter that. Any suggestions?
> >
> > I find this sort of behaviour on several websites as viewed in Firefox,
> > often where a British pound sign should be. When I view the source in a
> > tool to shows the hexadecimal value of the characters (okteta for
> > example) I find that they are all group values greater than 0x7f, that is
> > beyond the encoding scope of most character sets.
> > I just assumed that the funny negative question mark is a way of saying
> > "what the heck".
> >
> > The characters that you sent were 0x4c 0xef 0xbf 0xbd 0x63 0x6b so I am
> > not surprised that nothing much could be done with it.
>
> May I enlarge on my rather hasty post of last night.
>
> The sequence 0xef 0xbf 0xbd is the utf-8 encoding application's way of
> saying that it recognised a character that did not fit in the legal utf-8
> character space.
>
> I have found two pages in Wikipedia that describe it better than I can:
> http://en.wikipedia.org/wiki/UTF-8 will show the utf-8 encoding technique.
> http://en.wikipedia.org/wiki/Mapping_of_Unicode_character_planes#Basic_Mult
>ilingual_Plane is the mapping of character sets onto the unicode number
> plane.
> http://en.wikipedia.org/wiki/Unicode_Specials is the key page that
> describes what can go wrong to give you the "what the heck?".
>
> I hope that this clears up everything and for haters of the big
> red-mondster the smug feeling that it was all caused by a dirty "quick fix"
> colliding with a well thought out solution!

I simply gave up because it seemed there was nothing I can do about it :-).  
However, thanks for the links to explanations.  It does fit, in that you say 
that the codes sent were 0x4c 0xef 0xbf 0xbd 0x63 0x6b.  The 'wrong' character 
was an u-umlaut, which, according to kcharselect is

General Character Properties
Block: Latin-1 Supplement
Unicode category: Letter, Lowercase
Various Useful Representations
UTF-8: 0xC3 0xBC
UTF-16: 0x00FC
C octal escaped UTF-8: \303\274
XML decimal entity: ü

therefore the original was not composed, I assume, in utf-8.  I suspect that 
the mail was sent from a works computer running windows.

Anne
-- 
New to KDE4? - get help from http://userbase.kde.org
Just found a cool new feature?  Add it to UserBase
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde/attachments/20090908/15872403/attachment.sig>
-------------- next part --------------
___________________________________________________
This message is from the kde mailing list.
Account management:  https://mail.kde.org/mailman/listinfo/kde.
Archives: http://lists.kde.org/.
More info: http://www.kde.org/faq.html.