[Owncloud] Archive reading resource hog (bug oc-1224)

Michael Gapczynski mtgap at owncloud.com
Wed Sep 19 19:14:31 UTC 2012


I see the long session_start call on there as well. I was investigating this 
issue a while back, but I didn't have the experience or knowledge to improve 
it. I'm hoping we can take a look at our session handling before ownCloud 5, 
because half of page load time is just the basic 'are we logged in?'.


Michael


On Wednesday, September 19, 2012 10:06:12 PM Victor Dubiniuk wrote:
> Hi Sam,
> 
> the bottleneck of the OC_Archive_TAR is [1]. Caching $headers to a
> protected member of OC_Archive_TAR class reduced CPU load by half while
> browsing tar.gz.
> Here is a call graph for browsing the tar archive [2]. As you see
> tar->listContent() is called twice for eache file in the archive: firstly
> to get the size and secondary to get the mime.
> 
> [1] https://github.com/owncloud/core/blob/master/lib/archive/tar.php#L122
> [2]
> https://owncube.com/public.php?service=files&token=e2a65ae1442997590c77e6eab
> 72370cdce41e98d&file=/callgraph2.png
> 
> Hope will be helpful,
> Victor
> 
> On Wed, Sep 19, 2012 at 9:08 PM, Sam Tuke <samtuke at owncloud.com> wrote:
> > Here's a status update on bug oc-1224 that I've been working on.
> > 
> > When files_archive is enabled, and a new archive is added to ownCloud, all
> > subfiles and directories are opened on the server and added to the
> > filecache.
> > This hogs resources to a show-stopping extent.
> > 
> > As far as I can see it's not necessary to do this. With gzipped files, if
> > only
> > the archive file itself is read into the cache when it is first uploaded,
> > then
> > you can browse through the archive in the web interface quite happily, as
> > each
> > time you open a subdirectory, the contents of that directory are read into
> > the
> > cache. I have used path regexes to prevent recursive scanning of archives,
> > and
> > this works for gzip files.
> > 
> > Zip files however don't work the same way however. Unless the whole
> > archive is
> > scanned into the file cache when it's first added, the archive is not
> > browsable
> > via web interface. I'm not sure why rescans are triggered for gzip
> > archives
> > and not for zip archives.
> > 
> > However, even when recursive scanning of gzip archives is prohibited, the
> > resources required to scan even a few files within an archive are
> > impractical.
> > On my dual core machine, a gzip file with only three subfiles (small
> > images) and
> > three subdirectories takes about 30 seconds to scan. Scanning the contents
> > of
> > the top level of any real world web app archive (like tinymce or phplist),
> > which has about ten files / directories in its root folder, takes more
> > than 5
> > minutes.
> > 
> > I'm currently trying to identify exactly where the bottleneck is -
> > commenting
> > out all of scanfile() (in filecache.php) doesn't ease things, so the issue
> > must
> > presumably lay somewhere in scan(). I think I need to get xdebug working
> > again
> > to investigate further.
> > 
> > That's it from me for this week.
> > 
> > Best,
> > 
> > Sam.
> > _______________________________________________
> > Owncloud mailing list
> > Owncloud at kde.org
> > https://mail.kde.org/mailman/listinfo/owncloud



More information about the Owncloud mailing list