KTar patch to fix speed problems with tar ioslave

Jan Schaefer jan at kdewebdev.org
Sun Feb 8 19:10:11 CET 2004


Hi,

this is my first patch for kdelibs, so please be considerate. ;-)

Problem
---------
Konqueror has the ability to open tar files with the tar ioslave.
You can browse the tar file and copy files or whole directories from the tar 
file to other places.
This is good so far. However, if the tar file is compressed with either gzip 
or even worse with bzip2, it takes ages to copy directories from the tar file 
to other places ( I tried to extract an 11 MB tar.bz with the Qt sources, and
after 5 Minutes only 30 files from 6000 had been extracted ).
There is also a bug report that describes that problem: #25275.

The Source of Evil
------------------
Since this problem isn't solved in KDE 3.2, I started to search the 
bottleneck. The first problem lies in the design of the io slaves. If whole 
directories are copied, the files are copied one after the other always 
asking the io slave to get the next file. This works in most of the cases, 
for example it works nice for tar files which are not compressed. But why are 
compressed tar files so much slower? The answer lies in the implementation of 
the KTar class, which is used by the tar io slave.

The KTar class uses compression filters if the tar file it has to open is 
compressed. So far so good, but if you want to extract a file from the 
archive the compression filters have to be walked through until the file is 
found and then the file can be extracted. If you now extract one file after 
the other, the compression filter has to be walked through each time to find 
the beginning of the file in the archive. An this is the source of evil. 
Because it actually means that for every file the archive has to be extracted 
before. So if you have a bzipped file with 6000 entries, the file is 
extracted 6000 times! But it would be enough to extract it only one time.
This is the reason, why the current tar io slave is so slow (and for bigger 
files unusable).

Solution
--------
After I had found the source of the problem I thought about an solution.
One solution would be to change the interface of io slaves and to allow to get 
whole directories. I do not think that this is necessary, as the current 
interface works most of the times very good. Another solution would be to 
change the KTar class, and that is what I have done.

Instead of applying the compression filters to the actual device that is used 
by KTar, I use the compression filters to decompress the tar file and store 
the decompressed file in a temporary file. This temporary file is then used 
as the device for KTar. Now it is not necessary anymore to walk through the 
compression filter each time a file is extracted from the archive.

The solution increases the extraction speed of tar.bz2 (and tar.gz) files 
dramatically. It takes now about two minutes to extract the whole 11MB Qt 
tar.bz2 on my computer.

However, there is a little drawback because it needs disk extra space for the 
temporary file. But I cannot think of better solution.

For me this patch works very well, but what do you think of it?


Jan Schaefer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: ktar.patch
Type: text/x-diff
Size: 6932 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/kde-optimize/attachments/20040208/a50281db/ktar.bin


More information about the Kde-optimize mailing list