[Owncloud] Archive reading resource hog (bug oc-1224)

Victor Dubiniuk victor.dubiniuk at gmail.com
Wed Sep 19 20:06:15 UTC 2012


Handling a session is quite expensive operation for PHP in general. You
know it stores the session data in ordinary files.

As Sam  has written in his letter the problem for the uploaded archives
originates from  OC_FileCache post_write hook and definitely related
to  cache OC_FileCache::scan. Since there are several operations
 involving OC_FilesystemView there any call
to OC_FilesystemView::basicOperation   can be responsible fot triggering
another post_write hook.

I just would like to stress the great opportunity of increasing general
performance for *.tgz and *.bz2 files.

---
Victor

On Wed, Sep 19, 2012 at 10:14 PM, Michael Gapczynski <mtgap at owncloud.com>wrote:

> I see the long session_start call on there as well. I was investigating
> this
> issue a while back, but I didn't have the experience or knowledge to
> improve
> it. I'm hoping we can take a look at our session handling before ownCloud
> 5,
> because half of page load time is just the basic 'are we logged in?'.
>
>
> Michael
>
>
> On Wednesday, September 19, 2012 10:06:12 PM Victor Dubiniuk wrote:
> > Hi Sam,
> >
> > the bottleneck of the OC_Archive_TAR is [1]. Caching $headers to a
> > protected member of OC_Archive_TAR class reduced CPU load by half while
> > browsing tar.gz.
> > Here is a call graph for browsing the tar archive [2]. As you see
> > tar->listContent() is called twice for eache file in the archive: firstly
> > to get the size and secondary to get the mime.
> >
> > [1]
> https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122
> > [2]
> >
> https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab
> > 72370cdce41e98d&file=/callgraph2.png
> >
> > Hope will be helpful,
> > Victor
> >
> > On Wed, Sep 19, 2012 at 9:08 PM, Sam Tuke <samtuke at owncloud.com> wrote:
> > > Here's a status update on bug oc-1224 that I've been working on.
> > >
> > > When files_archive is enabled, and a new archive is added to ownCloud,
> all
> > > subfiles and directories are opened on the server and added to the
> > > filecache.
> > > This hogs resources to a show-stopping extent.
> > >
> > > As far as I can see it's not necessary to do this. With gzipped files,
> if
> > > only
> > > the archive file itself is read into the cache when it is first
> uploaded,
> > > then
> > > you can browse through the archive in the web interface quite happily,
> as
> > > each
> > > time you open a subdirectory, the contents of that directory are read
> into
> > > the
> > > cache. I have used path regexes to prevent recursive scanning of
> archives,
> > > and
> > > this works for gzip files.
> > >
> > > Zip files however don't work the same way however. Unless the whole
> > > archive is
> > > scanned into the file cache when it's first added, the archive is not
> > > browsable
> > > via web interface. I'm not sure why rescans are triggered for gzip
> > > archives
> > > and not for zip archives.
> > >
> > > However, even when recursive scanning of gzip archives is prohibited,
> the
> > > resources required to scan even a few files within an archive are
> > > impractical.
> > > On my dual core machine, a gzip file with only three subfiles (small
> > > images) and
> > > three subdirectories takes about 30 seconds to scan. Scanning the
> contents
> > > of
> > > the top level of any real world web app archive (like tinymce or
> phplist),
> > > which has about ten files / directories in its root folder, takes more
> > > than 5
> > > minutes.
> > >
> > > I'm currently trying to identify exactly where the bottleneck is -
> > > commenting
> > > out all of scanfile() (in filecache.php) doesn't ease things, so the
> issue
> > > must
> > > presumably lay somewhere in scan(). I think I need to get xdebug
> working
> > > again
> > > to investigate further.
> > >
> > > That's it from me for this week.
> > >
> > > Best,
> > >
> > > Sam.
> > > _______________________________________________
> > > Owncloud mailing list
> > > Owncloud at kde.org
> > > https://mail.kde.org/mailman/listinfo/owncloud
> _______________________________________________
> Owncloud mailing list
> Owncloud at kde.org
> https://mail.kde.org/mailman/listinfo/owncloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/owncloud/attachments/20120919/25ab9fcd/attachment.html>


More information about the Owncloud mailing list