Hi Sam,<div><br></div><div>the bottleneck of the OC_Archive_TAR is [1]. Caching $headers to a protected member of OC_Archive_TAR class reduced CPU load by half while browsing tar.gz.</div><div>Here is a call graph for browsing the tar archive [2]. As you see tar->listContent() is called twice for eache file in the archive: firstly to get the size and secondary to get the mime.</div>

<div><br></div><div>[1] <a href="https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122">https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122</a></div><div>[2] <a href="https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab72370cdce41e98d&file=/callgraph2.png">https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab72370cdce41e98d&file=/callgraph2.png</a></div>

<div><br></div><div>Hope will be helpful,</div><div>Victor<br><br><div class="gmail_quote">On Wed, Sep 19, 2012 at 9:08 PM, Sam Tuke <span dir="ltr"><<a href="mailto:samtuke@owncloud.com" target="_blank">samtuke@owncloud.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Here's a status update on bug oc-1224 that I've been working on.<br>

<br>

When files_archive is enabled, and a new archive is added to ownCloud, all<br>

subfiles and directories are opened on the server and added to the filecache.<br>

This hogs resources to a show-stopping extent.<br>

<br>

As far as I can see it's not necessary to do this. With gzipped files, if only<br>

the archive file itself is read into the cache when it is first uploaded, then<br>

you can browse through the archive in the web interface quite happily, as each<br>

time you open a subdirectory, the contents of that directory are read into the<br>

cache. I have used path regexes to prevent recursive scanning of archives, and<br>

this works for gzip files.<br>

<br>

Zip files however don't work the same way however. Unless the whole archive is<br>

scanned into the file cache when it's first added, the archive is not browsable<br>

via web interface. I'm not sure why rescans are triggered for gzip archives<br>

and not for zip archives.<br>

<br>

However, even when recursive scanning of gzip archives is prohibited, the<br>

resources required to scan even a few files within an archive are impractical.<br>

On my dual core machine, a gzip file with only three subfiles (small images) and<br>

three subdirectories takes about 30 seconds to scan. Scanning the contents of<br>

the top level of any real world web app archive (like tinymce or phplist),<br>

which has about ten files / directories in its root folder, takes more than 5<br>

minutes.<br>

<br>

I'm currently trying to identify exactly where the bottleneck is - commenting<br>

out all of scanfile() (in filecache.php) doesn't ease things, so the issue must<br>

presumably lay somewhere in scan(). I think I need to get xdebug working again<br>

to investigate further.<br>

<br>

That's it from me for this week.<br>

<br>

Best,<br>

<br>

Sam.<br>_______________________________________________<br>

Owncloud mailing list<br>

<a href="mailto:Owncloud@kde.org">Owncloud@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/owncloud" target="_blank">https://mail.kde.org/mailman/listinfo/owncloud</a><br>

<br></blockquote></div><br></div>