Fwd: Re: selectTextAsHTML()

John Tapsell john at geola.co.uk
Sat Jun 4 16:22:28 BST 2005


Hi,
  Could someone enlighten me and garycramblitt (phantomsdad) about what 
toString() etc should do?  See the forward email from him.

Thanks
JohnFlux

----------  Forwarded Message  ----------

Subject: Re: selectTextAsHTML()
Date: Friday 03 June 2005 17:22
From: Gary Cramblitt <garycramblitt at comcast.net>
To: John Tapsell <johnflux at geola.co.uk>

On Friday 03 June 2005 10:26 am, John Tapsell wrote:
> Gary Cramblitt wrote:
> >On Friday 03 June 2005 04:55 am, you wrote:
> >>What does document().toString() return at the moment?  I can't remember
> >>if that returns the tags or just the text.
> >
> >It returns tags and text, but it is not well-formed xhtml, like
> >selectTextAsHTML() does.
>
> This sounds like a bug.  Can you give me an example of what it does wrong?
> Please note that toString will also most likely not give safe urls - the
> password will probably be left in.

Here's a portion of query = part->document().toString().string() output from
http://www.google.com/:

konqueror: KHTMLPluginKTTSD::slotReadOut: query =
 <HTML><HEAD><SCRIPT>function
 PrivoxyWindowOpen(){return(null);}</SCRIPT><META http-equiv="content-type"
 content="text/html; charset=ISO-8859-1"><TITLE>Google</TITLE><STYLE><!--
 body,td,a,p,.h{font-family:arial,sans-serif;}
..q{color:#0000cc;}
//--></STYLE><SCRIPT>
<!--
function sf(){document.f.q.focus();}
// --></SCRIPT></HEAD><BODY bgcolor="#ffffff" text="#000000"
 link="#0000cc" vlink="#551a8b" alink="#ff0000" onload="sf()" topmargin="3"
marginheight="3"><CENTER><IMG src="/intl/en/images/logo.gif" width="276"
height="110" alt="Google"><BR><BR>

There are several xhtml violations in the fragment above:

1.  Tags are uppercase.  They must be lowercase.
2.  The META tag is not closed.
3.  The BR tags are not closed.  Should be <br/>.

Here's what I see when I choose View Source in Konqi (same portion of web
page):

<html><head><script>function PrivoxyWindowOpen(){return(null);}</script><meta
http-equiv="content-type" content="text/html;
charset=ISO-8859-1"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
..q{color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b
alink=#ff0000 onLoad=sf() topmargin=3 marginheight=3><center><img
src="/intl/en/images/logo.gif" width=276  height=110 alt="Google"><br><br>

Notice that the tags are lowercase, but the meta and br tags are still not
closed.  BTW, google doesn't claim this is xhtml.

Here's the same portion output using the selectAll/selectedTextAsHTML method:

konqueror: KHTMLPluginKTTSD::slotReadOut: query = <!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<body><CENTER><IMG src="http://www.google.com/intl/en/images/logo.gif"
width="276" height="110" alt="Google"/><BR/><BR/>

Notice that it didn't output  <html> or <head> (including META) at all.  The
case of the tags is still wrong (I can live with that), but the <BR> tags are
properly closed.

My main goal is that the text is well-formed enough that I can run it through
xsltproc to apply my xhtml to SSML stylesheet without xsltproc gagging and
quiting.  Unclosed tags, in particular, will cause xsltproc to error out.
selectedTextAsHTML() seems to meet that requirement.

--
Gary Cramblitt (aka PhantomsDad)
KDE Text-to-Speech Maintainer
http://accessibility.kde.org/developer/kttsd/index.php

-------------------------------------------------------




More information about the kfm-devel mailing list