Hi Sam,<div><br></div><div>the bottleneck of the OC_Archive_TAR is [1]. Caching $headers to a protected member of OC_Archive_TAR class reduced CPU load by half while browsing tar.gz.</div><div>Here is a call graph for browsing the tar archive [2]. As you see tar->listContent() is called twice for eache file in the archive: firstly to get the size and secondary to get the mime.</div>
<div><br></div><div>[1] <a href="https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122">https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122</a></div><div>[2] <a href="https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab72370cdce41e98d&file=/callgraph2.png">https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab72370cdce41e98d&file=/callgraph2.png</a></div>
<div><br></div><div>Hope will be helpful,</div><div>Victor<br><br><div class="gmail_quote">On Wed, Sep 19, 2012 at 9:08 PM, Sam Tuke <span dir="ltr"><<a href="mailto:samtuke@owncloud.com" target="_blank">samtuke@owncloud.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Here's a status update on bug oc-1224 that I've been working on.<br>
<br>
When files_archive is enabled, and a new archive is added to ownCloud, all<br>
subfiles and directories are opened on the server and added to the filecache.<br>
This hogs resources to a show-stopping extent.<br>
<br>
As far as I can see it's not necessary to do this. With gzipped files, if only<br>
the archive file itself is read into the cache when it is first uploaded, then<br>
you can browse through the archive in the web interface quite happily, as each<br>
time you open a subdirectory, the contents of that directory are read into the<br>
cache. I have used path regexes to prevent recursive scanning of archives, and<br>
this works for gzip files.<br>
<br>
Zip files however don't work the same way however. Unless the whole archive is<br>
scanned into the file cache when it's first added, the archive is not browsable<br>
via web interface. I'm not sure why rescans are triggered for gzip archives<br>
and not for zip archives.<br>
<br>
However, even when recursive scanning of gzip archives is prohibited, the<br>
resources required to scan even a few files within an archive are impractical.<br>
On my dual core machine, a gzip file with only three subfiles (small images) and<br>
three subdirectories takes about 30 seconds to scan. Scanning the contents of<br>
the top level of any real world web app archive (like tinymce or phplist),<br>
which has about ten files / directories in its root folder, takes more than 5<br>
minutes.<br>
<br>
I'm currently trying to identify exactly where the bottleneck is - commenting<br>
out all of scanfile() (in filecache.php) doesn't ease things, so the issue must<br>
presumably lay somewhere in scan(). I think I need to get xdebug working again<br>
to investigate further.<br>
<br>
That's it from me for this week.<br>
<br>
Best,<br>
<br>
Sam.<br>_______________________________________________<br>
Owncloud mailing list<br>
<a href="mailto:Owncloud@kde.org">Owncloud@kde.org</a><br>
<a href="https://mail.kde.org/mailman/listinfo/owncloud" target="_blank">https://mail.kde.org/mailman/listinfo/owncloud</a><br>
<br></blockquote></div><br></div>