KTar patch to fix speed problems with tar ioslave
Jan Schäfer
JanSchaefer at gmx.de
Sun Feb 8 21:18:43 CET 2004
Hi Justin,
> I admit that I haven't read through much of the ioslave code at all, so
> just going off what you've written.
>
> Why don't you instead of untarring the entire file (which it appears you've
> done), extract the directory itself, and move each file over through the
> ioslave? This way, you're not using much more disk space (only enough to
> hold the directory, hopefully it's on the same partition as the target, or
> there's enough space in /tmp or wherever), and it's just as fast.
>
> If that's what you do, then ignore this, since I don't know much about the
> design of the ioslaves, but it does appear that you're decompressing the
> entire file/device, and if you only want a single directory, you need only
> to extract that path from the archive, so this would be a little faster.
>
> Please no flames if I missed something.
Currently I decompress the entire file. That has to be done, because the io
slave does not know which file it has to return next. I decompress a tar.bz2
file to the .tar file. It would be possible, however, to extract that tar
file, too and work on the decompressed directory instead.
My implementation was just a first test of my idea to extract the archive at
once and work on the extracted archive. I did it this way, because the
implementation was much easier to realize. I also noticed that the tar io
slave is pretty fast if the tar file is not compressed. I just wanted to test
my theory that the compression filters are the bottleneck.
The question is if the other approach would improve the extraction speed.
The moving of the already decompressed files would take no time (if on the
same partition), but the initial extraction would take longer. There is also
the question where to implement that approach: In the KTar class or in the
tar io slave?
The KTar class already offer a way to extract the whole archive at once (the
directory()->copy_to() method), so hopefully the compression filters would
not be a bottleneck then. So the io slave could call this method and extract
the contents to a temporary directory and work on that directory later. This
would not be much work to implement I think.
But there is also the space argument. You are right that if the files would be
moved from the temporary directory it would take no extra space.
But what if the user wants to extract files a second time?
Then the whole archive had to be extracted again, because the files had been
moved already. This would be a rare case, however, and perhaps we could live
with that.
Jan
More information about the Kde-optimize
mailing list