Some Improvements on Adblocker

David Faure faure at kde.org
Mon Oct 9 18:26:07 BST 2006


On Monday 09 October 2006 18:12, Philipp Hülsdunk wrote:
> Am Sonntag, 8. Oktober 2006 23:41 schrieb David Faure:
> > On Sunday 08 October 2006 12:13, Philipp Hülsdunk wrote:
> > > I am trying now to make my idea clear.
> > > My Idea of ad blocker contains two parts.
> > > The first part is a block-list: It contains regular expressions of
> > > sources of imgaes. If one regular expression matches to the a source of
> > > an image that image will not be shown. This part is already in khtml
> > > implemented. It is the current ad blocker.
> > > My second part of my idea is to search the html code with regular
> > > expressions and to replace it by regular expressions. These find and
> > > replace expressions are stored in a list. This part allows to remove java
> > > script ads too. It works should wor like Proxomitron or Privoxy.
> > > Look here to know about
> > > Proxomitron "http://en.wikipedia.org/wiki/Proxomitron".
> > > What I need to implement this is to have direct access to the html code
> > > and to manipulate. So could somebody tell me how can I do this. I do not
> > > know much of khtml and its functions.
> >
> > The text comes to khtml in KHTMLPart::write() but that's probably too early
> > to filter it; it needs to be converted to the right encoding first...
> > Better add your hooks into Tokenizer::write which is called by khtmlpart.
> > Make sure to make it fast though, this is speed-critical code...
> 
> for every pair p of regular expressions in the filter list do
>     find the first match of p.first and replace it by p.second

Yes, and regexp matching -is- slow...

> void XMLTokenizer::write( const TokenizerString &str, bool appendData )
> 
> Can I access the string html code with m_source.data?

write is called multiple times, the data arrives in chunks. I realize that this might
not be the best place for matching regexps then, since chunking might break the
html in the middle of a possible regexp match. But well the loading is all incremental
so I don't really know where else the search/replace could be done...

-- 
David Faure, faure at kde.org, sponsored by Trolltech to work on KDE,
Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).




More information about the kfm-devel mailing list