KIO: Mass Copy of Files from Different Sources to Different Destinations

Tue Aug 25 15:10:46 BST 2009

After going back and refamiliarizing myself with the scheduler code, I have no 
objection in principle to the patch. Actually, I completely misunderstood the 
whole patch as a changing the scheduler behavior when it did not. The 
scheduler indeed allocates N ioslaves per protocol where N is obtained from 
the "maxInstances" property in the *.protocol files. For example, for the 
protocols in kdelibs this value is set to "3" for http, "4" for file and "2" 
for ftp.

The only issue of concern that needs to be investigated with this change AFAIC 
is one of bottleneck at the scheduler. How does this impact multiple 
applications making requests to distinct remote servers since the limits are 
per protocol and not host ? Is limiting the maximum number of  the file ioslave 
instances to "4" appropriate ? What about http etc etc ? In other words we 
have to test some use cases and if needed adjust these limits accordingly. One 
additional note... The documentation for KProtocolInfo::maxSlaves(...), the 
function the scheduler uses to find out the number of ioslaves it can spawn, 
states that the limits set by this value are 'soft' limits and can be exceeded 
under certain circumstances, but I cannot figure out under what circumstance 
that might happen to scheduled jobs in KIO::Scheduler. 

Anyhow, the only change that is missing from the patch is the modification of 
KIO::Scheduler::doJob function to remove a job from the scheduled jobs list if 
when it is called just like it is done in the ::scheduleJob function now. This 
then can be used by any download manager that wants to control the scheduling 
of jobs. So by default jobs get scheduled, but applications that want more 
control can simply call KIO::Scheduler::doJob(...) directly to bypass the 
scheduling process and ignore the limits set in the *.protocol files...

On Thursday 20 August 2009 11:04:22 David Faure wrote:
> On Thursday 20 August 2009, Dawit A. wrote:
> > "Direct scheduling" guarantees that if I make 4 requests one
> > after the other, all those requests will immediately spawn a kioslave.
>
> Which is a dangerous guarantee, if you end up with "FTP server: too many
> connections". So we need this to be configurable somehow (and I think
> the somehow is "by the user" rather than "through the kio api", since apps
> shouldn't have to care whether they're talking to often-limited-FTP or to
> fish/sftp/smb/file/etc.).
>
> > In the scheduled case, no matter what everything is queued and
> > processed serially one after the other.
>
> Are you sure? That would be bad, we don't want that, we want "process
> things at the pace of N slaves per protocol or host". But this might be
> what maelcum started working on and that isn't finished, though.
> In that case, we have to postpone this change indeed.
>
> For kio_file I certainly want the behavior where N is quite big, so that
> things happen in parallel -- but not too much, cf the initial post in this
> thread. N should be maybe 50, but not infinite.
>
> > Well the "not waiting much" is a very important point here. And as an
> > example I will give you a download manager that provides the user the
> > option to set the number of connections to make per site. How would it be
> > able to do that if the default is changed to "Scheduled" and "Direct" is
> > completely gone ?
>
> By having that option being honoured by the scheduler ;-)
>
> But this looks like the heart of our disagreement here; I'm thinking of a
> situation where the kio scheduler would implement that, and therefore -all-
> apps benefit from "N connections per site" limitations (which would fix a
> very old kde bug), rather than the user _having_ to use a download manager
> in order to get that feature.
>
> > In addition to the fact that KIO::Scheduler::scheduleJob does not allow
> > anything but a SimpleJob (dunno why that restriction) to be scheduled,
>
> Because only a SimpleJob is talking to an actual kioslave.
> A CopyJob is a high-level job which takes care of subjobs: one for doing a
> stat (file or directory?), one for doing a recursive listing of that
> directory, one job for copying each file, one mkdir job per directory to
> create, etc. That "high-level" job is totally unknown to the scheduler,
> since the scheduler is about giving slaves to jobs ;) But the highlevel
> CopyJob doesn't need any slave for itself, it's just a qobject with timers
> and slots and subjobs.
>
> > the function name itself is the source of confusion. It would have been
> > better if the function was named queueJobs or something along that line
> > and allowed you to provide the maximum number of ioslaves that can be
> > spawned to service any request per site/location.
>
> As I said, I don't think that decision belongs to the application. It
> belongs to the KDE configuration (i.e. the user, with sensible defaults).
> The fact that ftp.foo.org only accepts 1 connection per IP is not something
> that all apps should have to care about. Or the fact that the KIO scheduler
> shouldn't see 150 incoming jobs at the same time. We can take care of such
> problems inside KIO, without duplicating logic in the apps.
>
> An app which -wants- to control this, like a download manager, should of
> course be able to tune settings in the kio scheduler, I'm not rejecting
> that possibilty of course, but this is only for special kinds of apps, not
> something all apps should have to do.
>
> Oh and with my patch all kio jobs automatically get scheduled so
> scheduleJob() becomes completely internal, no need to call it from
> anywhere, and therefore its name doesn't really matter :-))