[Owncloud] Archive reading resource hog (bug oc-1224)

Wed Sep 19 19:06:12 UTC 2012

Hi Sam,

the bottleneck of the OC_Archive_TAR is [1]. Caching $headers to a
protected member of OC_Archive_TAR class reduced CPU load by half while
browsing tar.gz.
Here is a call graph for browsing the tar archive [2]. As you see
tar->listContent() is called twice for eache file in the archive: firstly
to get the size and secondary to get the mime.

[1] https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122
[2]
https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab72370cdce41e98d&file=/callgraph2.png

Hope will be helpful,
Victor

On Wed, Sep 19, 2012 at 9:08 PM, Sam Tuke <samtuke at owncloud.com> wrote:

> Here's a status update on bug oc-1224 that I've been working on.
>
> When files_archive is enabled, and a new archive is added to ownCloud, all
> subfiles and directories are opened on the server and added to the
> filecache.
> This hogs resources to a show-stopping extent.
>
> As far as I can see it's not necessary to do this. With gzipped files, if
> only
> the archive file itself is read into the cache when it is first uploaded,
> then
> you can browse through the archive in the web interface quite happily, as
> each
> time you open a subdirectory, the contents of that directory are read into
> the
> cache. I have used path regexes to prevent recursive scanning of archives,
> and
> this works for gzip files.
>
> Zip files however don't work the same way however. Unless the whole
> archive is
> scanned into the file cache when it's first added, the archive is not
> browsable
> via web interface. I'm not sure why rescans are triggered for gzip archives
> and not for zip archives.
>
> However, even when recursive scanning of gzip archives is prohibited, the
> resources required to scan even a few files within an archive are
> impractical.
> On my dual core machine, a gzip file with only three subfiles (small
> images) and
> three subdirectories takes about 30 seconds to scan. Scanning the contents
> of
> the top level of any real world web app archive (like tinymce or phplist),
> which has about ten files / directories in its root folder, takes more
> than 5
> minutes.
>
> I'm currently trying to identify exactly where the bottleneck is -
> commenting
> out all of scanfile() (in filecache.php) doesn't ease things, so the issue
> must
> presumably lay somewhere in scan(). I think I need to get xdebug working
> again
> to investigate further.
>
> That's it from me for this week.
>
> Best,
>
> Sam.
> _______________________________________________
> Owncloud mailing list
> Owncloud at kde.org
> https://mail.kde.org/mailman/listinfo/owncloud
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/owncloud/attachments/20120919/fcbf7599/attachment.html>