[Kget] the content fetcher plugin proposal

Ningyu Shi shiningyu at gmail.com
Mon Mar 31 05:54:29 CEST 2008


Hi everybody,
    Just done with the proposal, any comments for the last minute update?
--------------------------
=Project Details=

==The Kross Framework==

In order to fetch specific content from a web page/website, we need a
design to allow user to extract the content URLs from the web page and
return it to KGet for downloading. Considering the complexity level of
analyzing web pages, a user script system should be a good choice.
Kross is a modular scripting framework that provides a complete
framework to embed scripting interpreters like Python, Ruby and
JavaScript transparently into native applications. In this project,
Kross will be used as a bridge between script developer and KGet core
which transfer URL to user script and transfer the analyzing result
back to KGet.

==The Transfer-Plugin==

KGet handles different kinds of download session via a transfer-plugin
system. Once a plugin is selected based on the URL provided by user,
it will be used to handle the specific download session. In this
project, a transfer plugin handling content fetching will be
implemented. The plugin will determine which user script to use based
on a user supplied function inside each script, then run the chosen
script to extract the content URLs and add them to the download list.
All these will be done using the Kross framework on a standalone
thread in case the user script may be doing something complicated and
block the whole program.

==Post-treatment==

After the user script extracted the content URLs, the script should
add these URLs to download queue by calling certain addDownload(url)
function. An Class will wrap the KGet::AddTransfer() call to provide
the function and enable user options like download all URLs to a
certain directory or let the user choose case by case. We propose this
"call function" scheme because simply return the URL list to KGet may
suffer several problems: A download session may contain tens of URLs
and waiting for the end of script will be unacceptable. If we have
another thread waiting for the script update the list variable,
lock/synchronization issue will raise and add the complexity of the
solution. So let the script call theaddDownload() function gives the
developer more freedom to do their job easily.

=Timeline=

Phase 1, May 26 to June 29
Coding up the plugin structure and make a dummy example works.

Phase2, June 30 to July 14
Implementing a Youtube/Google-Video flash movie downloader within the
framework as an example.

Phase3, July 15 to August 3
Implementing the GUI stuff and a simple script management system.

Phase 4, Remaining time
Debugging, testing and polishing the project. Writing documents and
maybe more example scripts if time permitted.

=Personal Info & Experience=
I'm a graduate student majored in ECE. My research is about electronic
device simulation. I'm familiar with C++ and have done several
nontrivial projects one of which is an device simulator which heavily
use OO. I have read some tutorials on Qt and went through some toy
examples which make me feel quite comfortable. I've written a tiny
multi-thread content fetcher which use wget to do the download job.
That project is written in python with an analyzing thread, several
downloading threads and a GUI thread to update the status using PyQt.
That project have certain week points. Wget can't download files
concurrently and only allow one thread per download session, having
bunches of wget processes is quite expensive. Python's multi-thread
library is not efficient enough so sometimes the analyzing thread
makes the whole program slow. However, these problems are well solved
in KGet.KIO handles the download session so we can have multiple
concurrent downloads. We have QThread to handle the thread stuff
efficiently. Most importantly, KGet has much better user interface and
is easy to extend.

I can work for this project 10-12 hours per week, given that I still
have to do my research during summer. I'm pretty comfortable to work
with a mentor through emails and IM/IRC chats. I'm a native Chinese
speaker and quite fluent in English, so communication should not be a
problem.

Among the various downloader project under Linux, I find KGet to be
most promising which has a well designed object model and a decent
codebase. KGet is under active development, so I can work with the
developer team and the community to get this job done.
---------------------------------------
Thanks
-- 
Ningyu


More information about the Kget mailing list