[Owncloud] Sync Client needs server help
Thomas Müller
thomas.mueller at tmit.eu
Fri May 18 13:49:07 UTC 2012
Am Freitag, dem 18.05.2012 um 15:38 schrieb Klaas Freitag:
> On 17.05.2012 22:26, Brad McEvoy wrote:
> Hi Brad,
>
> thanks for your interesting feedback! I think your post did not make it
> to the mailinglist, but I'll forward it with this answer.
>
> >
> > I'm not a developer on OwnCloud, but i did a dotcom startup a while back
> > trying to be a file sync service like dropbox, but was a bit late to the
> > party
> >
> > I'm now converting that to an open source project (see
> > https://github.com/Spliffy/spliffy - similar goals, but much less
> > advanced then owncloud, java based). I posted to this list a few months
> > back suggesting that we share experience and work towards a standards
> > based and interoperable toolset. I think standards and interopability
> > would generally strengthen the open source offerings as opposed to the
> > closed source services currently proliferating.
> Yes, standards are good. And I tried to stay as tight to WebDAV as
> possible yet to keep the door open for interoperability.
> >
> > Regarding your question below I'd like to share my experience. I first
> > implemented path based sync, as you have done. I have since come to
> > believe this is far from optimal. And others from mature and established
> > sync product companies share that view.
> >
> > What git does, and i think this is a good model for any sync tool, is
> > calculate hashes (ie checksums) for files and for directories. Where the
> > hash for a directory is the checksum of a formatted list of its members
> > names and hashes. This means that the root folder has a hash which
> > uniquely identifies the current state of everything inside it. The
> > client can calculate the same hash for its contents. So, to check if
> > files are in sync you simply compare the hash of the root directories on
> > client and server. If they are different you walk down the directory
> > tree, ignoring directories that have the same hash on client and server,
> > and locating changed items based on their relative checksums. This is
> > very fast, very efficient, and very very robust. Its easy to integrate
> > into a webdav server as its just an extra propery in PROPFIND or header
> > in a HEAD response. It requires server support so that any change to any
> > resource results in updated hashes right up to the syncronisation root.
>
> I understand the concept and indeed its good. It's very near to what I
> want to implement, with the only difference that instead of the hash
> sums, I'd like to use the mtimes, as csync does. Why do we think thats a
> benefit: Well, based on the mtimes its decideable which version is
> newer. Moreover, the mtime is already a natural meta data in each file
> system, so we do not have to add something new. That given, csync runs
> without server support by now.
>
But using mtimes you have the requirement to have sync'ed times, which maybe
cannot be properly setup on every system (e.g. webspace you have no influence.).
A checksum has not that issue.
Just my two cents on that.
Tom
Using the checksum will be the starting point to implement de-doubing as well.
Maybe we need it anyway sooner or later.
> What is missing is the propagation of the mtimes from individual files
> and directories to their parent directory. If we do that with the
> ownCloud server support, I think we will have the same benefits that you
> described above. As we have the data in a database server side we will
> be able to retrieve the data fast.
>
> >
> > Note that there is a related RFC - http://tools.ietf.org/html/rfc6578 -
> > however I'm not confident that the approach outlined there is quite right.
> Do you know if its implemented in a WebDAV server already?
>
> > Of course finding what files are new or updated is one thing,
> > communicating those changes efficiently is another. Spliffy uses a
> > similar approach to Bup (https://github.com/apenwarr/bup) to split files
> > into blobs which are stable with respect to file changes. Only changed
> > blobs are transmitted.
> >
> > The hashsplitting algorithm is **very** simple, and if you're not doing
> > something like this yet i suggest you take a peek -
> > https://github.com/HashSplit4J/hashsplit-lib
> Thats cool and is a problem we also still have on our list to tackle.
> I stumbled over this already and wonder if there is a C or C++ lib for that.
>
> > Sorry for the long post, and I hope this is of some assistance.
> Great, I really appreciate your input.
>
> Best,
>
> Klaas
>
> >
> > On 17/05/2012 9:12 p.m., Klaas Freitag wrote:
> >> Hi,
> >>
> >> one of the biggest shortcomings of the sync client currently is that
> >> it does a full scan of its the ownCloud directories via webdav to
> >> query the last modified times. That causes load and other trouble. It
> >> would be great to find out if something has changed server side more
> >> cheaply.
> >>
> >> We have the file system cache which also has the mod times in the
> >> database. My idea is now, instead of querying every single file, I
> >> just issue a HEAD request on the top sync directory and get the latest
> >> modtime of all files in that dir back. If that is younger than the one
> >> I know, I have to do a sync.
> >>
> >> I know that it could be even more cool, ie. delivering the list of
> >> files back etc. but lets do small steps. Doing just one HEAD instead
> >> of querying the whole tree already will be great.
> >>
> >> The implementation seems easy: Just get all database id's of the
> >> fscache table entries below the top directory of the sync dir and do
> >> kind of
> >> SELECT MAX(mtime) FROM fscache WHERE id in ( list-of-all-ids-in dir );
> >> That should be fast enough.
> >>
> >> My question now is: How do we do that? Should we have another app
> >> called /files/sync? Or do we want to enhance the WebDAV server to be
> >> able to do the described logic if a HEAD request on a dir comes in?
> >>
> >> I think the latter is more "within the concept" of doing the sync via
> >> WebDAV, OTOH a sync app could be useful anyway for other sync related
> >> server support.
> >>
> >> What do you think?
> >>
> >> Thanks,
> >>
> >> Klaas
> >> _______________________________________________
> >> Owncloud mailing list
> >> Owncloud at kde.org
> >> https://mail.kde.org/mailman/listinfo/owncloud
>
> _______________________________________________
> Owncloud mailing list
> Owncloud at kde.org
> https://mail.kde.org/mailman/listinfo/owncloud
More information about the Owncloud
mailing list