using KHTML without display

Tue May 3 17:30:22 BST 2005

Thanks very much for the xfake suggestion. Suppressing the display is part of what I'm after. The more critical requirement is to dispense with functionality that my application doesn't need. I want the application to be as small and fast as possible. What's the minimal context that allows the DOM, CSS, and KJS classes to be used for parsing and measurement?
  ----- Original Message ----- 
  From: Nom Declavier 
  To: kfm-devel at kde.org 
  Sent: Sunday, May 01, 2005 6:31 PM
  Subject: using KHTML without display

  I'd like to use KHTML to parse HTML/CSS/Javascript, and to deduce the
  sizes and positions of Web page constituents when the page is rendered
  by a KHTML-based browser. But I want to do this without actually
  rendering to any screen, and without invoking more browser functionality
  than I need. My planned application has no graphical user interface. It
  brings about no display. It's all about trees whose nodes may be
  annotated with size and position information. I expect to call getRect()
  frequently.

  So what I really need is DOM::Document and so on. I'll use KApplication,
  KHTMLPart, KHTMLView, and so on, only as I need them to invoke the
  functionality of DOM, CSS, and KJS classes.

  I'm aware of two ways to get a DOM::HTMLDocument from an HTML file, without
  getting into windows and widgets.

  Technique 1 looks like this:

  DOM::HTMLDocument doc;
  doc.setAsync(false);
  doc.load(url);

  Technique 1 has two serious problems. First, when getRect() is called on
  nodes, it produces no useful information. Second, when the program
  exits, either the automatically-invoked destructors bomb, or if I invoke
  destructors myself, they still bomb.

  Technique 2 looks like this, where inputHTMLQString is a QString that's
  read from the HTML file, it doesn't matter how.

  KHTMLPart * pPart = new KHTMLPart();
  pPart->begin();
  pPart->write(inputHTMLQString);
  pPart->end();
  DOM::Document doc = pPart->document();

  Technique 2 has two serious problems. First, when getRect() is called on
  nodes, it produces no useful information, so there's nothing to choose
  between Technique 1 and Technique 2 here. Technique 2 leads to graceful
  destruction, but it brings along by default a very fussy version of the
  HTML parser which wreaks havoc with scripts, among other constituents. I
  can get around the fussy parser, but the way I've done it so far isn't
  pretty.

  If I want to have all of the following:

  calls to getRect() produce useful results
  tolerant parser
  effective destruction

  and I want to have them to the extent possible without without windows
  and widgets, I'm guessing the best technique isn't either of the ones
  I've tried. Best aside, what's a good technique?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.kde.org/mailman/private/kfm-devel/attachments/20050503/baa122f8/attachment.htm>