[Kde-pim] codec from a KMime::Message

Thomas McGuire mcguire at kde.org
Wed Oct 28 15:21:26 GMT 2009


Hi,

On Wednesday 28 October 2009 15:37:30 Thomas McGuire wrote:
> For KMail to display the message correctly, it needs to know the charset
> the mail was written in. If the wrong charset is used, the mail will be
> displayed incorrectly, e.g. umlauts are displayed the wrong way. (There is
> even a word for that: Mojibake).
>
> So to display the mail correctly, the charset encoding needs to be known.
> To solve this, the charset encoding is noted down in the mail itself, as
> part of the content-type header.
>
> [..]
> 
> Summary:
> Fallback charset: Charset that is used when the message has no specified
> charset
> Override charset: Charset that is always used, even if the message
> specifies a charset
> MIME part: Mails can consist of multiple parts, which form a tree. Each
> part has headers.

Ok, now let me explain charset vs content-transfer-encoding, for the brave.

The charset is only used when displaying text MIME parts, e.g. text/plain.

But what is content-transfer-encoding?
The problem is that mails can not use the full byte range from 0 to 255 when 
being sent, this is disallowed.
However, most attachments like images, zip files and so on do use all 256 byte 
values. Text encoded with some charsets also use the full byte range from 0 to 
255, for example UTF-8.

Mail sending is constrained to only a part of the byte range, 0 to 127 I 
think. This is fine for ASCII text, since that is 0 to 127 only, but things 
like attachments or UTF-8 text which use the full byte range can not be sent.

To solve that, stuff that uses the full byte range needs to be 
transformed/encoded to something that only uses the first 128 byte values. The 
encoding that does this is called content-transfer encoding. There are 4 
different content-transfer-encodings:

7-bit: This does nothing, it assumes that the input is already in the 0 to 127 
byte value range, and therefore the input can be sent unencoded.

base64: Encodes each and every byte-value into two human-readable characters 
from the alphabet

quoted-printable: Encodes each non-ascii character as an equal-sign followed 
by two human-readable characters from the alphabet. The advantage over base64 
is that nearly all ASCII characters remain unchanged and is therefore much 
more human-readable, but the disadvantage is that this encoding scheme has 
more overhead.

8-bit: This is an exception, some SMTP servers actually do support sending 
mails which contain the fully byte range from 0 to 255. For those SMTP 
servers, input that uses the full byte range does not need to be encoded at 
all, which is called the 8-bit content-transfer-encoding.

The rule is: Use 7 bit if it is possible, mail clients can deal with that the 
best and it has no space overhead.
If something doesn't fit into 7 bit, use either base64 or quoted-printable. It 
depends on the input which of those is best for space usage. quoted-printable 
is much more human-readable, though.
KMime even has a nice class to detect which is the best content-transfer-
encoding for a given input, see KMime::CharFreq::type().

(BTW, it is even more fun when dealing with linebreaks, but even I don't know 
the details there)

For text parts, the content-transfer-encoding is applied on top of the charset 
encoding, e.g. first the text is charset-encoded with ISO-8859-15, then it is 
content-transfer-encoded with quoted-printable.

Non-text parts, like attachments, don't need a charset encoding, since there 
is no text to display. Those parts are encoded with the content-transfer-
encoding only.

Content-transfer-encodings are specified in RFC 2045, section 6.

Regards,
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20091028/afe5a925/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list