[kde-linux] Encoding questions (Chusslove Illich)

Emanoil Kotsev deloptes at yahoo.com
Thu Jun 12 14:12:54 UTC 2008


> On Monday 09 June 2008 08:14 am, Emanoil Kotsev
> wrote:
> > The encoding for the merriam-webster page seems to
> be
> > iso8859-1.

The site is definitely is8559-1 encoded.

> 
> One thing I noticed the other day, but forgot to
> mention:  Yes, there is 
> something on that page that seems to say the page is
> encoded in iso8859-1:
> 
> <meta http-equiv="Content-Type" content="text/html;
> charset=iso-8859-1" />
> 
> But elsewhere on the same page there are lines that
> suggest that at least some 
> part of it might be encoded in utf-8:
> 
> google_afs_ie     = 'utf8';                  //
> select input encoding scheme
> google_afs_oe     = 'utf8';                  //
> select output encoding scheme

these are properties/arguments of the googleads
function. I guess it is used to convert the encoding
of the data into the one you are using

> 
> My guess is that the definition is fetched from some
> database and displayed 
> using utf-8.  (On the other hand, maybe the utf-8 is
> only for google ads or 
> similar displayed on that page?)  (A further guess
> is that, if the 
> pronunciation key were displayed in iso-8859-1 ...
> 
> Well, wait--the one clue I have is that if I C&P the
> definition from konqueror 
> to kate, with kate changed to a font that can
> display the correct glyphs (the 
> upside down e, for example), the pronunciation key
> is displayed correctly in 
> kate.  Would that work if the encoding on the
> konqueror page was iso-8859-1, 
> or only if it was utf-8?  I'm not sure, and don't
> desperately need to know at 
> the moment. ;-)
> 
> Just for reference, here is a C&P of the
> pronunciation "key" from one m-w page 
>
(http://www.merriam-webster.com/dictionary/intelligent):
> 
> Pronunciation: \in-?te-l?-j?nt\
> 
> I guess I just wanted to note that there is some
> uncertaintly, at least in my 
> mind, as to whether the definition on the m-w.com
> pages is encoded in 
> iso-8859-1 or utf-8.  If it is encoded in
> iso-8859-1, could it be displayed 
> properly if C&P'd into kate?  
> 

Look at the source code of the page and you'l find the
secret:

    <dt class="pron">Pronunciation:</dt>

    <dd class="pron">
      <span class="pronchars">\in-<span
class="unicode">ˈ</span>te-lə-jənt\</span>
    </dd>

this means they use the W3C recomendation for encoding
characters in html from the unicode definition.

Welcome to the encodings hell!

regards





      



More information about the kde-linux mailing list