KTar patch to fix speed problems with tar ioslave

Jan Schaefer jan at kdewebdev.org
Tue Feb 10 13:21:00 CET 2004


Hi,

here is my status report.

> Well, the copy job doesn't support it yet, so you will need to start there
> then. The idea is to use a single MultiGetJob instead of multiple Get jobs.

I looked at the copy job source.
As far as I understood it, the whole copy job is designed to go recursively 
through the directories, collect all files and call the the KIO::file_copy 
method to copy each single file.
The FileCopyJob handles the get and put calls then.
The CopyJob itself does not use these low level methods, and as far as I 
understood the code, it will be much work to use the MultiGet job for copying 
files. As the code of the CopyJob is really difficult to understand I better 
do not start to mess around there.

> You probably also need to store in the .protocol file whether the protocol
> supports mget, because that is currently hardcoded in MultiGetJob::start
> See kdelibs/kio/kio/kprotocolinfo.h and
> kdelibs/kdecore/kprotocolinfo_kdecore.cpp for how to make that information
> available to KIO::Job
This is easy, however, it would be BIC IMHO and I would like to have the speed 
fix get into 3.2.1 if possible.

I thought about the whole stuff a lot more and came to the conclusion that 
even if the multiGet method could be implemented to use for the copy job, 
even this would not be enough. 
I will start with the copyTo() method of the KArchiveDirectory class.
I tested the speed of this method with the old filter code.
And it was really fast!
So I looked at the code of KArchiveDirectory and what there is done is that 
the files are sorted by the position in the tar file, before they are 
extracted. The extraction is then done in a linear way. So the compression 
filters do not need to jump backwards and that is what the whole process 
makes slow. 
So even if the multiGet method could be used, the URLs have to be sorted in 
the right way, and at least the CopyJob cannot know about this.
So the kio tar slave had to sort the URLs. This could be done by the kio tar 
slave. However, the CopyJob does not know about the new sort order and I 
think that could produce some other problems (however I am not sure here).

To sum it up: If the multiGet method should be used for the tar slave, this 
would be much work, and it has to be done in the critical area of the 
CopyJob. It would also be BIC (IMHO).
This all has to be done to avoid the little drawback of the extra space needed 
for a temporary extracted tar file.

I also thought again about the other method to extract the whole archive to a 
temporary directory and to work on that directory instead to work on the tar 
file. This will perhaps solve the problem of the extra space, but it 
introduces some new problems. For example the file owner, that is stored in 
the tar file, will be lost, and if someone only wants to look into a tar file 
(s)he will be a little surprised to see that (s)he has packed the Qt source 
code from Trolltech. ;-)

So I came to the conclusion that my current patch is the easiest way to fix 
the speed problem, and it is also not BIC. And I even think that the user 
expects that the tar file is extracted to a temporary file when looking at it 
(I could be totally wrong with this, however).


Greetings,

	Jan


More information about the Kde-optimize mailing list