[Owncloud] Filesystem merged into master

Robin Appelman icewind at owncloud.com
Fri Feb 10 12:26:58 UTC 2012


On Thursday 09 February 2012 12:26:30 Klaas Freitag wrote:
> On 08.02.2012 17:46, Robin Appelman wrote:
> 
> Hi,
> 
> I browsed a bit through the filecache code today and have some
> questions. I am nitpicking here a bit because I think if we want to
> scale up to many files this might become a performance bottleneck if we
> dont take care from beginning. OTOH we can get real benefit from this
> cache, thanks for starting that :-)
> 
> > Earlier today I merged the filesystem branch into master,
> > the filesystem branch holded multiply improvements to the entire
> > filesystem
> > infrastructure of ownCloud, including the option to access the files
> > outside the users home folder, and caching of file info in the database
> > for quick access.
> 
> - Table layout of fscache:
> * path and name columns: I think we should get rid of the name column to
> keep the table small and avoid redundancies. The name col is AFAICS only
> used in search(), there and in other places the name can be easily computed.
> * user string: I strongly would stay away from a string based user col, for
> two reasons: The string is more costly than an int, and, the user name
> might not always be unique. Imagine we authenticate from two
> independent sources of user data (LDAP and local for example) than there
> can be users with the same name. Thats problematic anyway, but way
> better to handle if you have an id to an owncloud user object that
> covers that kind of problems
> 
> BTW - wouldn't it make sense to drop the user dependency completely and
> create the fscache db within the users space, meaning one for every
> user, maybe even in memory? Not sure, have never tried.
> 
> * mimetype normalisation: I think the mimetypes should be normalized.
> The mimetypes table can be cached in a var and the table becomes smaller.
> 
> - Indexes:
> Currently existing indexes AFAICS:
>    index|parent_index     | oc_fscache |(parent ASC)
>    index|parent_name_index| oc_fscache |(parent ASC, name ASC)
> 
> There are missing some IMO:
>    * on path -> used in get()
>    * on (name, user) -> used in search. This is a LIKE SELECT which is
> difficult anyway, see http://www.sqlite.org/optoverview.html#like_opt
> As said, I would try to get rid of name and possibly also of user.
>    * on (mimepart, user) ->  used in searchByMime
> 
> - TRANSACTION
> For mass INSERTs, we should explicitely call BEGIN and END Transaction
> 
> - while loops calling functions
> Code running in while loops (here often readdir over all files in a dir)
> often call sub functions in which others are called... Each of them can
> do SQL statements independently.
> 
> Database interaction becomes faster if a prepare statement is not called
> for each and every individual execution of a statement, but once and
> than executed for a list of values. So it might make sense to call
> prepare in an outer function and hand the $query object to called subs.
> 
> One example for a loop is
> updateFolder() -> fileSystemWatcherWrite() -> scanFile() -> put()
> 
> in updateFolder is the readdir loop and put() finally does UPDATE or
> INSERT statements. In between there are SELECTs here and there.
> 
> Often this can be solved by first collecting all object data in code,
> for example the isUpdated thing: Now there is a loop over readdir,
> calling the isUpdated() function, it does a prepare( "SELECT mtime...")
> for each path. Maybe it would be better to first collect the paths like
> while( readdir ) pathlist.append(path)
> and than call something like
> SELECT mtime from fscache WHERE path in (explode pathlist)
> 
> - paths: Paths can be complicated anyway, because there are many
> starting with the same string... I have seen system which store a hash
> such as an MD5 in this kind of cache to have more powerful search
> support. Mabbe that would be worth a try :-)
> 
> Some of the points I made are argueable and a bit fishy and depend on a
> lot of parameters such as the database, the kind of data etc. pp. It
> would be good to have a testing and performance measuring framework for
> this I think to really fix the measurements.
> 
> Again, sorry if that sounds like wise-guying, thats not intended. Thanks
> for picking this difficult but important task. I am very happy to
> discuss and help whereever needed :-)
> 
> regards,
> 
> Klaas

A problem is that mysql can't handle indexes longer then 333 characters when 
using utf8, which can be to limited, and even keys of 333 characters (999 
byte) dont seem to efficient to me.
Adding a collumn with the hash of the path, and index that seems like a proper 
sollution to me.

How would search be implemented if the name collumn is removed?

We are stuck with a string as user id for the moment, changing that would 
involve a lot of changes around the code.

Thanks for the feedback anyway, focusing on getting the bloody thing to work 
can leave preformance in the shadow sometimes :)

 - Robin Appelman



More information about the Owncloud mailing list