Modularity of DOM/JS parts

Paul Giannaros ceruleanblaze at gmail.com
Tue Mar 20 19:39:51 GMT 2007


On Tuesday 20 March 2007 12:53, Yan Seiner wrote:
> Paul Giannaros napsal(a):
> > Hi,
> >
> > I've been working on a project to facilitate web navigation/scraping
> > using a "real" browsing engine to handle (X)HTML/JavaScript/cookies. I
> > began using KHTMLPart with Xvfb (so that it works on X-less servers -- a
> > requirement), but that solution is proving to be increasingly finicky and
> > unextensible. A major problem is how closely it's tied to the graphical
> > side: there are constantly message boxes asking about cookies,
> > remembering passwords, kwallet asking things, etc.
> > Since then I've been looking at different engines to get this to work
> > the 'proper' way -- by seperating out the DOM/JS stuff from the graphical
> > side -- and KHTML looks like the nicest.
> >
> > How tied are the graphical and non graphical bits of KHTML? I'd need to
> > use the HTML parsing and DOM stuff, tie it with KJS, and probably handle
> > cookies myself. The (eventual) goal would be to make it work with
> > QCoreApplication.
>
> Hi Paul:
>
> I'm not sure what your ultimate goal is, but I am intrigued.
>
> I have an embedded panel that uses konqueror embedded as its user
> interface.  We've done a lot of tuning, but still, konq+qt is a pretty
> heavy weight app for a 200 MHz CPU and 32 MB RAM.
>
> We don't need most of konq.  We don't need generic web browsing.  What I
> need is a simple browser that can handle javascript, display tables, and
> GET and PUT forms.  Qt gives us some benefit since we can support i18n
> fairly easily.
>
> Is this something similar to what you're working on?  Or is your goal to
> produce a wget on steroids?

No, unfortunately not the same kind of thing. The end result of this will 
(hopefully!) be something that can be run on servers without a display -- it 
doesn't matter what the page looks like, as long as tag soup can be parsed 
into a meaningful DOM and traversed. It will make web scraping (something 
that happens in businesses more than I imagined) much easier. 
If you only need to render simple markup like tables, however, surely it would 
make sense to either build off something like QTextDocument and use Qt for 
HTTP transfers and then plug that with a javascript engine like KJS or 
SpiderMonkey yourself? I suppose if you need form submissions you need a 
browser that can handle forms :P, but that doesn't seem like an impossible 
task.

>
> --Yan




More information about the kfm-devel mailing list