[Kde-pim] codec from a KMime::Message

Thomas McGuire mcguire at kde.org
Wed Oct 28 14:37:30 GMT 2009


Hi Laurent,

On Wednesday 28 October 2009 10:49:00 laurent Montel wrote:
> in kmail4 we used KMMessage::codec() to get codec.
> Now in kmail-akonadi we use KMime::Message.
> Which is the function to get codec ?

Let me first explain a bit what codecs/charsets/encodings are about. Mails can 
be written in different charsets, for example in ASCII, UTF8, ISO-8859-15 and 
many others.

For KMail to display the message correctly, it needs to know the charset the 
mail was written in. If the wrong charset is used, the mail will be displayed 
incorrectly, e.g. umlauts are displayed the wrong way. (There is even a word 
for that: Mojibake).

So to display the mail correctly, the charset encoding needs to be known. To 
solve this, the charset encoding is noted down in the mail itself, as part of 
the content-type header. See the source of your mail, you'll see that it uses 
the iso-8859-1 charset.

Now, some mail clients unfortunately don't specify the charset the mail was 
written in, or even specify the wrong charset. But to display the mail 
correctly, we need to use the correct charset.
To solve this, KMail has a fallback character encoding. When KMail displays a 
message which has no charset specified, it uses the fallback charset instead. 
The fallback charset can be set by the user in the settings under Appearance-
>Message Window. By the default, the fallback charset is the local system 
encoding, since the user will likely communicate most with users that use the 
same language.
(By RFC, if the charset is not specified in the mail, that should mean ASCII 
charset, but because of all the incorrect mailers out there, that is not true, 
therefore the fallback encoding)

Then, KMail also has the option to set a override charset. This will be used 
even when a message has a charset specified, it will just override the charset 
specified in the message. The override charset can also be set in the settings 
under Appearance->Message Window, and also in the View menu under "Set 
Encoding". If the override charset is set to "Auto", KMail will not override 
the charset which is specified in the message and use the charset from the 
message instead, or the fallback charset if the message does not specify a 
charset.
The override charset can be set on a per-message basis with 
KMMessage::setOverrideCodec().

Mails can also consist of multiple MIME parts. Your mail was just a single 
text/plain MIME part, but some mails have more parts (think attachments or 
HTML mail).
The KMime class representing a single MIME part is KMime::Content. For those 
messages only consisting of a single MIME part, KMime::Message is the main and 
only MIME part (it inherits KMime::Content). For mails with multiple MIME 
parts, the main MIME part/KMime::Content can have child parts/contents, see 
KMime::Content::contents(). Therefore, the MIME parts form a tree. Each 
part/content can have headers, see KMime::Content::head().
All MIME parts that should be displayed as text should have a content-type 
header with a charset parameter to specify how they should be displayed.

Summary:
Fallback charset: Charset that is used when the message has no specified 
charset
Override charset: Charset that is always used, even if the message specifies a 
charset
MIME part: Mails can consist of multiple parts, which form a tree. Each part 
has headers.

Now finally to your question:
-----------------------------

KMMessage::codec() takes into account the override charset and the fallback 
charset, have a look at the source. codec() internally calls charset(), which 
looks into the headers to see if there is a content-type header with a charset 
parameter.
KMime::Content does support fallback encoding, see 
KMime::Content::setDefaultCharset().
KMime::Content also supports override encoding, see 
KMime::Content::setForceDefaultCharset().

If you set the fallback/override charset, that should work automatically in 
KMime, e.g. KMime::Content::decodedText() will take the fallback and override 
charset into account.
If you really need the QTextCodec that is used to decode and encode the 
charset, you'd need to write that method yourself.
(I'm not sure if KMime::Content::setDefaultCharset() and 
setForceDefaultCharset() actually propagate the charset to the child 
parts/contents.)

Side note 1: All this is completely unrelated to content-transfer-encoding, 
which is something else.

Side note 2: I've been a bit sloppy with the terms charset, encoding and codec 
in this mail, hope you still get the idea.

Puh, long mail.

Regards,
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20091028/d86f25fa/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list