[kde-linux] Encoding questions (Chusslove Illich)
Emanoil Kotsev
deloptes at yahoo.com
Thu Jun 12 14:12:54 UTC 2008
> On Monday 09 June 2008 08:14 am, Emanoil Kotsev
> wrote:
> > The encoding for the merriam-webster page seems to
> be
> > iso8859-1.
The site is definitely is8559-1 encoded.
>
> One thing I noticed the other day, but forgot to
> mention: Yes, there is
> something on that page that seems to say the page is
> encoded in iso8859-1:
>
> <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1" />
>
> But elsewhere on the same page there are lines that
> suggest that at least some
> part of it might be encoded in utf-8:
>
> google_afs_ie = 'utf8'; //
> select input encoding scheme
> google_afs_oe = 'utf8'; //
> select output encoding scheme
these are properties/arguments of the googleads
function. I guess it is used to convert the encoding
of the data into the one you are using
>
> My guess is that the definition is fetched from some
> database and displayed
> using utf-8. (On the other hand, maybe the utf-8 is
> only for google ads or
> similar displayed on that page?) (A further guess
> is that, if the
> pronunciation key were displayed in iso-8859-1 ...
>
> Well, wait--the one clue I have is that if I C&P the
> definition from konqueror
> to kate, with kate changed to a font that can
> display the correct glyphs (the
> upside down e, for example), the pronunciation key
> is displayed correctly in
> kate. Would that work if the encoding on the
> konqueror page was iso-8859-1,
> or only if it was utf-8? I'm not sure, and don't
> desperately need to know at
> the moment. ;-)
>
> Just for reference, here is a C&P of the
> pronunciation "key" from one m-w page
>
(http://www.merriam-webster.com/dictionary/intelligent):
>
> Pronunciation: \in-?te-l?-j?nt\
>
> I guess I just wanted to note that there is some
> uncertaintly, at least in my
> mind, as to whether the definition on the m-w.com
> pages is encoded in
> iso-8859-1 or utf-8. If it is encoded in
> iso-8859-1, could it be displayed
> properly if C&P'd into kate?
>
Look at the source code of the page and you'l find the
secret:
<dt class="pron">Pronunciation:</dt>
<dd class="pron">
<span class="pronchars">\in-<span
class="unicode">ˈ</span>te-lə-jənt\</span>
</dd>
this means they use the W3C recomendation for encoding
characters in html from the unicode definition.
Welcome to the encodings hell!
regards
More information about the kde-linux
mailing list