KTar patch to fix speed problems with tar ioslave

Sun Feb 8 21:18:43 CET 2004

Hi Justin,

> I admit that I haven't read through much of the ioslave code at all, so
> just going off what you've written.
>
> Why don't you instead of untarring the entire file (which it appears you've
> done), extract the directory itself, and move each file over through the
> ioslave?  This way, you're not using much more disk space (only enough to
> hold the directory, hopefully it's on the same partition as the target, or
> there's enough space in /tmp or wherever), and it's just as fast.
>
> If that's what you do, then ignore this, since I don't know much about the
> design of the ioslaves, but it does appear that you're decompressing the
> entire file/device, and if you only want a single directory, you need only
> to extract that path from the archive, so this would be a little faster.
>
> Please no flames if I missed something.
Currently I decompress the entire file. That has to be done, because the io 
slave does not know which file it has to return next. I decompress a tar.bz2 
file to the .tar file. It would be possible, however, to extract that tar 
file, too and work on the decompressed directory instead.

My implementation was just a first test of my idea to extract the archive at 
once and work on the extracted archive. I did it this way, because the 
implementation was much easier to realize. I also noticed that the tar io 
slave is pretty fast if the tar file is not compressed. I just wanted to test 
my theory that the compression filters are the bottleneck.

The question is if the other approach would improve the extraction speed.
The moving of the already decompressed files would take no time (if on the 
same partition), but the initial extraction would take longer. There is also 
the question where to implement that approach: In the KTar class or in the 
tar io slave?
The KTar class already offer a way to extract the whole archive at once (the 
directory()->copy_to() method), so hopefully the compression filters would 
not be a bottleneck then. So the io slave could call this method and extract 
the contents to a temporary directory and work on that directory later. This 
would not be much work to implement I think.

But there is also the space argument. You are right that if the files would be 
moved from the temporary directory it would take no extra space.
But what if the user wants to extract files a second time? 
Then the whole archive had to be extracted again, because the files had been 
moved already. This would be a rare case, however, and perhaps we could live 
with that.

Jan