[Kde-pim] codec from a KMime::Message
Thomas McGuire
mcguire at kde.org
Wed Oct 28 14:37:30 GMT 2009
Hi Laurent,
On Wednesday 28 October 2009 10:49:00 laurent Montel wrote:
> in kmail4 we used KMMessage::codec() to get codec.
> Now in kmail-akonadi we use KMime::Message.
> Which is the function to get codec ?
Let me first explain a bit what codecs/charsets/encodings are about. Mails can
be written in different charsets, for example in ASCII, UTF8, ISO-8859-15 and
many others.
For KMail to display the message correctly, it needs to know the charset the
mail was written in. If the wrong charset is used, the mail will be displayed
incorrectly, e.g. umlauts are displayed the wrong way. (There is even a word
for that: Mojibake).
So to display the mail correctly, the charset encoding needs to be known. To
solve this, the charset encoding is noted down in the mail itself, as part of
the content-type header. See the source of your mail, you'll see that it uses
the iso-8859-1 charset.
Now, some mail clients unfortunately don't specify the charset the mail was
written in, or even specify the wrong charset. But to display the mail
correctly, we need to use the correct charset.
To solve this, KMail has a fallback character encoding. When KMail displays a
message which has no charset specified, it uses the fallback charset instead.
The fallback charset can be set by the user in the settings under Appearance-
>Message Window. By the default, the fallback charset is the local system
encoding, since the user will likely communicate most with users that use the
same language.
(By RFC, if the charset is not specified in the mail, that should mean ASCII
charset, but because of all the incorrect mailers out there, that is not true,
therefore the fallback encoding)
Then, KMail also has the option to set a override charset. This will be used
even when a message has a charset specified, it will just override the charset
specified in the message. The override charset can also be set in the settings
under Appearance->Message Window, and also in the View menu under "Set
Encoding". If the override charset is set to "Auto", KMail will not override
the charset which is specified in the message and use the charset from the
message instead, or the fallback charset if the message does not specify a
charset.
The override charset can be set on a per-message basis with
KMMessage::setOverrideCodec().
Mails can also consist of multiple MIME parts. Your mail was just a single
text/plain MIME part, but some mails have more parts (think attachments or
HTML mail).
The KMime class representing a single MIME part is KMime::Content. For those
messages only consisting of a single MIME part, KMime::Message is the main and
only MIME part (it inherits KMime::Content). For mails with multiple MIME
parts, the main MIME part/KMime::Content can have child parts/contents, see
KMime::Content::contents(). Therefore, the MIME parts form a tree. Each
part/content can have headers, see KMime::Content::head().
All MIME parts that should be displayed as text should have a content-type
header with a charset parameter to specify how they should be displayed.
Summary:
Fallback charset: Charset that is used when the message has no specified
charset
Override charset: Charset that is always used, even if the message specifies a
charset
MIME part: Mails can consist of multiple parts, which form a tree. Each part
has headers.
Now finally to your question:
-----------------------------
KMMessage::codec() takes into account the override charset and the fallback
charset, have a look at the source. codec() internally calls charset(), which
looks into the headers to see if there is a content-type header with a charset
parameter.
KMime::Content does support fallback encoding, see
KMime::Content::setDefaultCharset().
KMime::Content also supports override encoding, see
KMime::Content::setForceDefaultCharset().
If you set the fallback/override charset, that should work automatically in
KMime, e.g. KMime::Content::decodedText() will take the fallback and override
charset into account.
If you really need the QTextCodec that is used to decode and encode the
charset, you'd need to write that method yourself.
(I'm not sure if KMime::Content::setDefaultCharset() and
setForceDefaultCharset() actually propagate the charset to the child
parts/contents.)
Side note 1: All this is completely unrelated to content-transfer-encoding,
which is something else.
Side note 2: I've been a bit sloppy with the terms charset, encoding and codec
in this mail, hope you still get the idea.
Puh, long mail.
Regards,
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20091028/d86f25fa/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list