using KHTML without display
Nom Declavier
achats at blarg.net
Tue May 3 17:30:22 BST 2005
Thanks very much for the xfake suggestion. Suppressing the display is part of what I'm after. The more critical requirement is to dispense with functionality that my application doesn't need. I want the application to be as small and fast as possible. What's the minimal context that allows the DOM, CSS, and KJS classes to be used for parsing and measurement?
----- Original Message -----
From: Nom Declavier
To: kfm-devel at kde.org
Sent: Sunday, May 01, 2005 6:31 PM
Subject: using KHTML without display
I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
sizes and positions of Web page constituents when the page is rendered
by a KHTML-based browser. But I want to do this without actually
rendering to any screen, and without invoking more browser functionality
than I need. My planned application has no graphical user interface. It
brings about no display. It's all about trees whose nodes may be
annotated with size and position information. I expect to call getRect()
frequently.
So what I really need is DOM::Document and so on. I'll use KApplication,
KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
functionality of DOM, CSS, and KJS classes.
I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
getting into windows and widgets.
Technique 1 looks like this:
DOM::HTMLDocument doc;
doc.setAsync(false);
doc.load(url);
Technique 1 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information. Second, when the program
exits, either the automatically-invoked destructors bomb, or if I invoke
destructors myself, they still bomb.
Technique 2 looks like this, where inputHTMLQString is a QString that's
read from the HTML file, it doesn't matter how.
KHTMLPart * pPart = new KHTMLPart();
pPart->begin();
pPart->write(inputHTMLQString);
pPart->end();
DOM::Document doc = pPart->document();
Technique 2 has two serious problems. First, when getRect() is called on
nodes, it produces no useful information, so there's nothing to choose
between Technique 1 and Technique 2 here. Technique 2 leads to graceful
destruction, but it brings along by default a very fussy version of the
HTML parser which wreaks havoc with scripts, among other constituents. I
can get around the fussy parser, but the way I've done it so far isn't
pretty.
If I want to have all of the following:
calls to getRect() produce useful results
tolerant parser
effective destruction
and I want to have them to the extent possible without without windows
and widgets, I'm guessing the best technique isn't either of the ones
I've tried. Best aside, what's a good technique?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20050503/baa122f8/attachment.htm>
More information about the kfm-devel
mailing list