using KHTML without display

Nom Declavier achats at blarg.net
Mon May 2 02:31:27 BST 2005


I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
sizes and positions of Web page constituents when the page is rendered
by a KHTML-based browser. But I want to do this without actually
rendering to any screen, and without invoking more browser functionality
than I need. My planned application has no graphical user interface. It
brings about no display. It's all about trees whose nodes may be
annotated with size and position information. I expect to call getRect()
frequently.

So what I really need is DOM::Document and so on. I'll use KApplication,
KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
functionality of DOM, CSS, and KJS classes.

I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
getting into windows and widgets.

Technique 1 looks like this:

DOM::HTMLDocument doc;
doc.setAsync(false);
doc.load(url);

Technique 1 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information. Second, when the program
exits, either the automatically-invoked destructors bomb, or if I invoke
destructors myself, they still bomb.

Technique 2 looks like this, where inputHTMLQString is a QString that's
read from the HTML file, it doesn't matter how.

KHTMLPart * pPart = new KHTMLPart();
pPart->begin();
pPart->write(inputHTMLQString);
pPart->end();
DOM::Document doc = pPart->document();

Technique 2 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information, so there's nothing to choose
between Technique 1 and Technique 2 here. Technique 2 leads to graceful
destruction, but it brings along by default a very fussy version of the
HTML parser which wreaks havoc with scripts, among other constituents. I
can get around the fussy parser, but the way I've done it so far isn't
pretty.

If I want to have all of the following:

calls to getRect() produce useful results
tolerant parser
effective destruction

and I want to have them to the extent possible without without windows
and widgets, I'm guessing the best technique isn't either of the ones
I've tried. Best aside, what's a good technique?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20050501/b54ae9b7/attachment.htm>


More information about the kfm-devel mailing list