[Nepomuk] Re: same nepomuk db for several users

Sat Dec 4 02:46:06 CET 2010

On Friday 03 December 2010 13:56:18 you wrote:
> ----- Original Message ----
> 
> > From: Evgeny Egorochkin <phreedom.stdin at gmail.com>
> > To: nepomuk at kde.org
> > Cc: Bruce Adams <tortoise_74 at yahoo.co.uk>; Sebastian  Trüg
> > <trueg at kde.org> Sent: Fri, December 3, 2010 12:19:20 AM
> > Subject: Re: [Nepomuk] Re: same nepomuk db for several users
> > 
> > On Thursday 02 December 2010 12:39:59 Bruce Adams wrote:
> > >  Hi,
> > >  
> > >     Sorry for jumping in.
> > > 
> > > Perhaps I missed the  point of a semantic desktop. Surely its critical
> > > and basic to be able to  share
> > > information between users. Imagine if we used different file names  for
> > > the files? It would be chaos.
> > > 
> > > I am concerned that  not considering a shared user model up front might
> > > mean a fundamental  redesign
> > > would be required to support it. Can you offer any  re-assurance?
> > 
> > Current APIs won't stop working if some other source of  metadata
> > appears, like
> > 
> > another user. As to the implementation, it's been  considered but there's
> > only
> > 
> > so much time devs have and this isn't the only  area which needs work.
> 
> One effect would on the ontology side. If two users both have the
> same set of relations describing say the rating for a file you need to add
> extra relations
> to indicate the user.

Nepomuk uses quads exactly for this purpose: to associate extra metadata with 
a triple. Currently it's mostly used to store last modification, data source 
and such techical information and naturally can be extended to store sharing 
preferences, user etc.

> The more serious issue is security. With a single user model there is no
> need for
> any kind of security. Adding security or making a system multi-user as an
> after thought
> is non trivial if thought hasn't been given up front.

You are confusing 2 different issues: (1) sharing of metadata between 
different nepomuk instances and (2) keeping data of several users in 1 nepomuk 
instance.

(1) has no security issues, and requires a sharing policy/preferences. Until 
we try,we don't know  how users will prefer to specify the policy: per-triple 
settings, templates,  blacklists, whitelists or most likely all these and then 
some more. However, since the sharing policy is orthogonal to the data itself, 
we can experiment as much as we want to.

(2) also requires enforcing the sharing policy during sparql queries which is 
basically a pipe dream, because no known(to me) sql/sparql backend has per-
triple security/visibility support, and if they ever gain it, there's no 
guarantee it will be flexible enough to support all the policies that will be 
needed by users.

When it comes to metadata sharing (1) is the generic case we absolutely must 
implement no matter what.

(2) is just a shortcut solution which is unimplementable and which looks good 
on the paper(supposedly faster than several instances, maybe slightly less 
space usage), but these advantages are mostly illusory: 
* to make queries fast, you need each user to have a personal set of indexes 
of their own "view" of the data set(referencing only the triples they have 
access to) and this means that the storage overhead of separate stores vs 
common store is not that great, space requirements are the same order of 
magnitude;
* query performance depends only on the amount of data in the users "view", so 
it's the same in both cases. The only apparent speed up is in syncing metadata 
changes between users, which seems to be faster when users use the same db... 
Until you consider that each user's indexes have to be updated on data change, 
so you might save some context switches and that's it.

It may be possible that in the future there will be multi-user nepomuk 
instances but for other reason, and storing only public data, similar to what 
last.fm used to be: since all data is public, all users have the same "view" 
of data and this is where you can gain some speed/storage advantage.
-- 
Evgeny