[Owncloud] Filesystem merged into master

Klaas Freitag freitag at owncloud.com
Thu Feb 9 11:26:30 UTC 2012


On 08.02.2012 17:46, Robin Appelman wrote:

Hi,

I browsed a bit through the filecache code today and have some 
questions. I am nitpicking here a bit because I think if we want to 
scale up to many files this might become a performance bottleneck if we 
dont take care from beginning. OTOH we can get real benefit from this 
cache, thanks for starting that :-)

> Earlier today I merged the filesystem branch into master,
> the filesystem branch holded multiply improvements to the entire filesystem
> infrastructure of ownCloud, including the option to access the files outside
> the users home folder, and caching of file info in the database for quick
> access.

- Table layout of fscache:
* path and name columns: I think we should get rid of the name column to 
keep the table small and avoid redundancies. The name col is AFAICS only 
used in search(), there and in other places the name can be easily computed.
* user string: I strongly would stay away from a string based user col, 
for two reasons: The string is more costly than an int, and, the user 
name might not always be unique. Imagine we authenticate from two 
independent sources of user data (LDAP and local for example) than there 
can be users with the same name. Thats problematic anyway, but way 
better to handle if you have an id to an owncloud user object that 
covers that kind of problems

BTW - wouldn't it make sense to drop the user dependency completely and 
create the fscache db within the users space, meaning one for every 
user, maybe even in memory? Not sure, have never tried.

* mimetype normalisation: I think the mimetypes should be normalized. 
The mimetypes table can be cached in a var and the table becomes smaller.

- Indexes:
Currently existing indexes AFAICS:
   index|parent_index     | oc_fscache |(parent ASC)
   index|parent_name_index| oc_fscache |(parent ASC, name ASC)

There are missing some IMO:
   * on path -> used in get()
   * on (name, user) -> used in search. This is a LIKE SELECT which is 
difficult anyway, see http://www.sqlite.org/optoverview.html#like_opt
As said, I would try to get rid of name and possibly also of user.
   * on (mimepart, user) ->  used in searchByMime

- TRANSACTION
For mass INSERTs, we should explicitely call BEGIN and END Transaction

- while loops calling functions
Code running in while loops (here often readdir over all files in a dir) 
often call sub functions in which others are called... Each of them can 
do SQL statements independently.

Database interaction becomes faster if a prepare statement is not called 
for each and every individual execution of a statement, but once and 
than executed for a list of values. So it might make sense to call 
prepare in an outer function and hand the $query object to called subs.

One example for a loop is
updateFolder() -> fileSystemWatcherWrite() -> scanFile() -> put()

in updateFolder is the readdir loop and put() finally does UPDATE or 
INSERT statements. In between there are SELECTs here and there.

Often this can be solved by first collecting all object data in code, 
for example the isUpdated thing: Now there is a loop over readdir, 
calling the isUpdated() function, it does a prepare( "SELECT mtime...") 
for each path. Maybe it would be better to first collect the paths like
while( readdir ) pathlist.append(path)
and than call something like
SELECT mtime from fscache WHERE path in (explode pathlist)

- paths: Paths can be complicated anyway, because there are many 
starting with the same string... I have seen system which store a hash 
such as an MD5 in this kind of cache to have more powerful search 
support. Mabbe that would be worth a try :-)

Some of the points I made are argueable and a bit fishy and depend on a 
lot of parameters such as the database, the kind of data etc. pp. It 
would be good to have a testing and performance measuring framework for 
this I think to really fix the measurements.

Again, sorry if that sounds like wise-guying, thats not intended. Thanks 
for picking this difficult but important task. I am very happy to 
discuss and help whereever needed :-)

regards,

Klaas



More information about the Owncloud mailing list