[Owncloud] Scaling considerations

Aggelos Economopoulos aoiko at cc.ece.ntua.gr
Sun Jul 4 15:05:25 UTC 2010


Am 29/06/2010 08:15 μμ, schrieb Tobias Hunger:
> Hi!
>> It struck me as odd that I couldn't find a discussion of the scaling
>> issues involved. Maybe I didn't look hard enough, maybe much of the
>> discussion takes place on irc or perhaps you have defined the problem
>> away :)
>
> Yeap, there could be more information on the project homepage about this
> (and any other issue:-)

Yes, definitely :) But discussing the issues is a necessary first step 
in that direction.

>> In any case, I'm concerned about file sharing between friends and what
>> would happen if I was running owncloud on my home box (or even some server
>> somewhere) and some of my files became too popular for my connection. As
>> things are, I'd stand a good chance of getting (accidentally and without
>> warning) DDoS'ed out of the internet.
>
> Is that really that big a problem I wonder? You are inviting people to view
> your stuff after all, so you can just ask them to stop again:-)

Well, I invite 5 people to take a look at this file, but these people 
find it so interesting that they point their friends to it as well. One 
of those people posts the link on some forum somewhere. Suddenly I get a 
SYN storm from people that I don't know and of course can't contact 
except by responding with a static error page (if that :)

> This does look a bit different if you host contents that is
> visible for everybody... but then running owncloud is no different
> then running any other web server on a server at your home. Just don't
> get anything hosted there mentioned on slashdot:-)

Yes, true. But since I am building up trust relationships as part of 
owncloud, this is a huge opportunity to make use of those relationships. 
E.g. my well-connected owncloud instance could automatically offer to 
mirror popular files of my immediate and/or trusted friends (or even 
files of their friends etc). This could be push-based. I am not claiming 
that this is the best approach and I'd be very interested in other 
suggestions.

You're of course absolutely right that one can define the problem away. 
However, I think that you shouldn't. After all, you're competing against 
proprietary systems that offer virtually unlimited bandwidth (and mostly 
sufficient amounts of space and computation power) for "free". But more 
importantly, such an architecture connects the ability to reach a lot of 
people with the financial ability to pay for the resources (see below). 
You could argue that the existing approach of finding somebody who wants 
to "sponsor" your ideas/exression/whatever and can provide the resources 
is good enough, but IME there are issues. The most important one being 
that when later there is a clash of ideas, it is kinda hard to move your 
net presence elsewhere.

>> Secondly, I would consider it very important to allow for mobility.
>
> Moving your data is hard with a web-server based approach. Basically you need
> to move the data and then ask everybody to update their URLs.

Depends on the scenario. If I'm changing hosting providers (with my home 
machine potentially being one of the providers), I could just leave 
behind a (machine interpreted) redirect and people would lazily update 
to my new location url. But say my account has been shut down by the 
hosting provider (e.g. because of some policy rules). Then I could use 
my private key and a backup of my friends list/roster to credibly notify 
them of the url change (you could get more clever here).

> A more "cloudy" approach would include things like encrypting data, applying
> a global unique Id to each piece of data and uploading it into some form of
> storage network. P2P comes into this, too, considering ownClouds "everybody
> can run its own server" idea...

Well, I don't know about encrypting... my concern was mostly about 
public files. If some files are encrypted for a group of people, I think 
it's highly unlikely that the group is large enough and the files are 
*that* big that computing/networking resources are going to be an issue.

> There are proposals on encrypted storage in the owncloud wiki, covering (parts
> of) this. Please comment on them!

We can all start proposing our favorite architectures, but IMHO that's a 
bit backwards. The more immediate questions have more to do with what we 
want out of the system.

Do we want it to scale without requiring the user to throw resources at 
the problem?
Do we want the identity to be in the hands of the user (so that a 
malicius hosting provider can only temporarily inconvenience its users?)?
If we end up using keypairs, how do we handle key management (that's 
probably the most interesting question)?
Do we care about forgery?
Do we want to enable end-to-end crypto between arbitrary users or 
between "friends"?
Do we want to allow multiple master copies that can get out of sync?

Personally, I'd want a system where I don't have to pay for resources in 
proportion to my popularity -- if I did, this would be a big incentive 
to try to monetize my popularity. I don't think I need to remind anyone 
interested in owncloud that system architecture is not apolitical.

In addition, I'd like to be able to use a friend's server w/o trusting 
them with my identity. Even more so, I'd like to be able to provide 
hosting for friends' internet presence _without_ them having to trust me 
absolutely. This does not necessarily mean that all data on the server 
should be encrypted.

Not all data is equally valuable anyway and I'd like to be able to 
access some of it from a machine I don't trust.

As for securely exchanging data between users, I think it's something 
worth doing but I don't see that as a priority. The social problems with 
pushing adoption of solutions like owncloud are big enough as it is. 
/Requiring/ encrypted data exchange is not realistic at this point IMHO. 
Allowing for it is a worthy goal though.

Diverging versions of the data is another "interesting" problem. Not 
sure how important that is.

My point is, we need to be clear about the requirements before we 
discuss specific architecture and implementation issues.

That said, my approach to encrypted file storage would be much closer to 
the first proposal (sorry Tobias :)

So, what requirements do people consider necessary? Keep in mind the 
obvious tradeoff between usefulness/coolness of features on the one hand 
and the implementation effort and social acceptance/usability issues on 
the other.

Aggelos




More information about the Owncloud mailing list