PyKHTML, a library for scraping the web for Python

Paul Giannaros ceruleanblaze at gmail.com
Sun Apr 29 14:57:09 BST 2007


Following the one or two posts that I've made to the list regarding KHTML, I'd 
like to announce the project that I've been working that uses it:

PyKHTML is a library for website spider/scraper creation. It is written for 
the Python programming language and uses PyQt/PyKDE. KHTML is the basis of it 
and provides support for Javascript, cookies, HTTPS, and doesn't choke on bad 
markup. It provides all of those quickly and efficiently. These capabilities 
of KHTML are used within a pythonic, asynchronous API to make use intuitive 
and easy.
It is of particular use to companies or individuals that need to interface 
with websites (scraping data, submitting forms) programmatically.
The website for PyKHTML can be found at http://paul.giannaros.org/pykhtml, 
with a development changelog, API documentation, and source downloads. 




More information about the kfm-devel mailing list