[Owncloud] Filesystem merged into master

Christophe M meumeu1402 at gmail.com
Thu Feb 9 19:53:04 UTC 2012


Hi,

With this new system can we still continue using the operating system
filesystem ?
If i have all my owncloud users shared also with samba, can I still use it ?
If I want to share all my /home users folder with owncloud, can I still use
it ?

Well the good question is : is it a file cache or the only entry point for
file access ?

Second question. I have closed the browser before the end of the scan
(after 2hours ...) and I can't see all my files .. anyway to continue the
scan ?

Thanks

Christophe

2012/2/9 Klaas Freitag <freitag at owncloud.com>

> On 08.02.2012 17:46, Robin Appelman wrote:
>
> Hi,
>
> I browsed a bit through the filecache code today and have some questions.
> I am nitpicking here a bit because I think if we want to scale up to many
> files this might become a performance bottleneck if we dont take care from
> beginning. OTOH we can get real benefit from this cache, thanks for
> starting that :-)
>
>
>  Earlier today I merged the filesystem branch into master,
>> the filesystem branch holded multiply improvements to the entire
>> filesystem
>> infrastructure of ownCloud, including the option to access the files
>> outside
>> the users home folder, and caching of file info in the database for quick
>> access.
>>
>
> - Table layout of fscache:
> * path and name columns: I think we should get rid of the name column to
> keep the table small and avoid redundancies. The name col is AFAICS only
> used in search(), there and in other places the name can be easily computed.
> * user string: I strongly would stay away from a string based user col,
> for two reasons: The string is more costly than an int, and, the user name
> might not always be unique. Imagine we authenticate from two independent
> sources of user data (LDAP and local for example) than there can be users
> with the same name. Thats problematic anyway, but way better to handle if
> you have an id to an owncloud user object that covers that kind of problems
>
> BTW - wouldn't it make sense to drop the user dependency completely and
> create the fscache db within the users space, meaning one for every user,
> maybe even in memory? Not sure, have never tried.
>
> * mimetype normalisation: I think the mimetypes should be normalized. The
> mimetypes table can be cached in a var and the table becomes smaller.
>
> - Indexes:
> Currently existing indexes AFAICS:
>  index|parent_index     | oc_fscache |(parent ASC)
>  index|parent_name_index| oc_fscache |(parent ASC, name ASC)
>
> There are missing some IMO:
>  * on path -> used in get()
>  * on (name, user) -> used in search. This is a LIKE SELECT which is
> difficult anyway, see http://www.sqlite.org/**optoverview.html#like_opt<http://www.sqlite.org/optoverview.html#like_opt>
> As said, I would try to get rid of name and possibly also of user.
>  * on (mimepart, user) ->  used in searchByMime
>
> - TRANSACTION
> For mass INSERTs, we should explicitely call BEGIN and END Transaction
>
> - while loops calling functions
> Code running in while loops (here often readdir over all files in a dir)
> often call sub functions in which others are called... Each of them can do
> SQL statements independently.
>
> Database interaction becomes faster if a prepare statement is not called
> for each and every individual execution of a statement, but once and than
> executed for a list of values. So it might make sense to call prepare in an
> outer function and hand the $query object to called subs.
>
> One example for a loop is
> updateFolder() -> fileSystemWatcherWrite() -> scanFile() -> put()
>
> in updateFolder is the readdir loop and put() finally does UPDATE or
> INSERT statements. In between there are SELECTs here and there.
>
> Often this can be solved by first collecting all object data in code, for
> example the isUpdated thing: Now there is a loop over readdir, calling the
> isUpdated() function, it does a prepare( "SELECT mtime...") for each path.
> Maybe it would be better to first collect the paths like
> while( readdir ) pathlist.append(path)
> and than call something like
> SELECT mtime from fscache WHERE path in (explode pathlist)
>
> - paths: Paths can be complicated anyway, because there are many starting
> with the same string... I have seen system which store a hash such as an
> MD5 in this kind of cache to have more powerful search support. Mabbe that
> would be worth a try :-)
>
> Some of the points I made are argueable and a bit fishy and depend on a
> lot of parameters such as the database, the kind of data etc. pp. It would
> be good to have a testing and performance measuring framework for this I
> think to really fix the measurements.
>
> Again, sorry if that sounds like wise-guying, thats not intended. Thanks
> for picking this difficult but important task. I am very happy to discuss
> and help whereever needed :-)
>
> regards,
>
> Klaas
>
> ______________________________**_________________
> Owncloud mailing list
> Owncloud at kde.org
> https://mail.kde.org/mailman/**listinfo/owncloud<https://mail.kde.org/mailman/listinfo/owncloud>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/owncloud/attachments/20120209/95b8ba91/attachment.html>


More information about the Owncloud mailing list