KTar patch to fix speed problems with tar ioslave
Jan Schaefer
jan at kdewebdev.org
Sun Feb 8 19:10:11 CET 2004
Hi,
this is my first patch for kdelibs, so please be considerate. ;-)
Problem
---------
Konqueror has the ability to open tar files with the tar ioslave.
You can browse the tar file and copy files or whole directories from the tar
file to other places.
This is good so far. However, if the tar file is compressed with either gzip
or even worse with bzip2, it takes ages to copy directories from the tar file
to other places ( I tried to extract an 11 MB tar.bz with the Qt sources, and
after 5 Minutes only 30 files from 6000 had been extracted ).
There is also a bug report that describes that problem: #25275.
The Source of Evil
------------------
Since this problem isn't solved in KDE 3.2, I started to search the
bottleneck. The first problem lies in the design of the io slaves. If whole
directories are copied, the files are copied one after the other always
asking the io slave to get the next file. This works in most of the cases,
for example it works nice for tar files which are not compressed. But why are
compressed tar files so much slower? The answer lies in the implementation of
the KTar class, which is used by the tar io slave.
The KTar class uses compression filters if the tar file it has to open is
compressed. So far so good, but if you want to extract a file from the
archive the compression filters have to be walked through until the file is
found and then the file can be extracted. If you now extract one file after
the other, the compression filter has to be walked through each time to find
the beginning of the file in the archive. An this is the source of evil.
Because it actually means that for every file the archive has to be extracted
before. So if you have a bzipped file with 6000 entries, the file is
extracted 6000 times! But it would be enough to extract it only one time.
This is the reason, why the current tar io slave is so slow (and for bigger
files unusable).
Solution
--------
After I had found the source of the problem I thought about an solution.
One solution would be to change the interface of io slaves and to allow to get
whole directories. I do not think that this is necessary, as the current
interface works most of the times very good. Another solution would be to
change the KTar class, and that is what I have done.
Instead of applying the compression filters to the actual device that is used
by KTar, I use the compression filters to decompress the tar file and store
the decompressed file in a temporary file. This temporary file is then used
as the device for KTar. Now it is not necessary anymore to walk through the
compression filter each time a file is extracted from the archive.
The solution increases the extraction speed of tar.bz2 (and tar.gz) files
dramatically. It takes now about two minutes to extract the whole 11MB Qt
tar.bz2 on my computer.
However, there is a little drawback because it needs disk extra space for the
temporary file. But I cannot think of better solution.
For me this patch works very well, but what do you think of it?
Jan Schaefer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ktar.patch
Type: text/x-diff
Size: 6932 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/kde-optimize/attachments/20040208/a50281db/ktar.bin
More information about the Kde-optimize
mailing list