[Nepomuk] Brainstorming: Metadata Sharing
Sebastian Trüg
trueg at kde.org
Wed Jun 30 13:30:49 CEST 2010
The more I read your discussion the more I am convinced that caching
remote metadata locally is a problem. The two reasons are as Artem said:
1. What to copy? In the worst case the whole graph is connected and we
loose information unless we copy everything.
2. How to make sure the data is up-to-date?
Still, let me draft you the ideas we came up with at the last Nepomuk
workshop:
We wanted to allow users to share certain resources or bits of metadata
with other users. This information would then be copied to the
interested parties, marking the origin in the enclosing graphs. The
resource URIs would not be changed since they are unique (this is a
theoretical assumption that sadly does not hold in reality since the
QUuid implementation creates a lot of duplicates - more about that
below). In the case of files, however, the nie:urls would be a special
kind of URL - something like telepathy:/<user>/<path-on-users-host> - so
KIO could handle those automatically if they appear as search results.
Another approach for resource URIs was to have a layer between local and
remote data that adds username information to the URIs. An example would be:
nepomuk:/res/<UUID>
becomes
nepomuk:/<username>@<something>/res/<UUID>
when copied to other hosts.
This would solve the UUIDs not being unique problem which keeping the
nepomuk:/ protocol and allowing a simple converting in both ways.
So maybe a middle way would be: we only copy the metadata that we "need"
or "want". We never copy strigi extracted data (since we can recreate
that once we have the file) but when copying the file we copy metadata
the other user created - like ratings or comments, but also relations to
other things. These things will then be referenced with URIs as above
and can be fetched on demand.
Thus, there would be no need to store all related resources. And in case
we want to use those resources in queries we query the remote client again.
Cheers,
Sebastian
On 06/30/2010 12:12 AM, Artem Serebriyskiy wrote:
>
>
> On Tue, Jun 29, 2010 at 11:31 PM, Vishesh Handa <handa.vish at gmail.com
> <mailto:handa.vish at gmail.com>> wrote:
>
> I was planning to have #3 ( no synchronization ), but I think #1
> would be better. #2 is a compromise between the two that won't
> really help us. Relations are what define resources. Without them,
> they are just unique identifiers.
>
> #1 seems interesting, but we'll have to implement a way to
> synchronize the offline metadata when the user comes online are
> performs another search. There is the added problem of privacy.
> Suppose user B contains A's metadata, and A marks its metadata as no
> longer sharable, we should probably delete A's metadata from B, but
> B could possible prevent this or make copies.
>
> As the whole object being a resource problem. Yes, we'll have to
> copy everything about the object as well. There is no way to avoid
> it, but typically every resource contains a maximum of 20
> properties, so it shouldn't be that much.
>
>
> I am not so sure about it. some resource has properties that points to
> another resource. If we copy music_file_resource, should we copy
> nco:contact that represents it's conductor ? If yes, then should we copy
> the friends of this conductor ? If no, then how to execute query "Find
> all music that was written by any of the friend of this <contact_name>"
> ? If yes, then should we copy friends of a friends of a ..... of a
> friends of a contact ? Where/when should we stop ?
>
> About revoking permissions. The copyright of the local copy of the
> shared metadata must be well defined. If license for these metadata are
> not free, then I must be asked for conformation of acceptance of this
> licence. If licenes is free, then the local copy of metadata are mine
> metadata and I( and most peoples too) defenetly would not allow delete
> something in my computer without my permission.
>
>
>
> BTW, it would be better to have resource uris in the form of
> nepomuk:/telepathy/whatever instead of telepathy:/nepomuk/whatever.
> There are loads of checks in the Nepomuk Core and services which
> make decisions based on a url's scheme. So it would be better to
> stick to "nepomuk".
>
>
> *What relations should be copied?*
> One option would to be copy all the relations ( shouldn't be too
> much ), but I think we should only copy the relations returned by
> the result of a query. That way it would be a lot easier to
> determine what relations other users are allowed to access, and
> would prevent copying or relations the user doesn't want to share.
> Copying only the transfered relations would require less requests.
>
> *
> *Even copying some part of the relations will not solve some of our
> problems. I dont see any way to check that
> 1) The stored copy of the metadata are correct( May be remote user has
> just fixed some very serious bugs in his metadata? And we still have
> version with bugs)
> 2) No more resources appeared in the any of the remote hosts Nepomuk
> storages that match our query
> without performing a query in the remote host. This means that for any
> `global` query we should always execute this query in all available
> remote hosts. Or we must be satisfied with
> probably-obsolete-or-incorrect metadat we have stored localy. In this
> case caching any of the relations localy seems useless.
>
> *P2P Network*
> That was the whole idea behind metadata sharing. Hopefully I'll be
> able to implement some form of it. :)
> *
> Who to query?
> *One option is that we have select queries, where the user chooses
> the contact before performing the query. The other is to perform the
> query on all the contacts ( might be a little expensive )
>
> *Sharing others metadata *
> User A has a some fragment of B's metadata. If C queries A, should
> it also return B's metadata? If yes, how do we deal with privacy?
> *
> Forwarded Queries?*
> Should one contacts forward it's query to it's other clients? If we
> forget about privacy for a second, this would be amazing. This would
> be ideal for storing a huge knowledge base on multiple systems.
>
> Just some random ideas
>
> - Vishesh Handa
>
>
> *Forwarded Queries* are automaticaly ( by [good] design) included into*
> P2P* network. There are problems with permisions(*Sharing others
> metadata ),* but I have heard about P2P network with this feature. I
> will try to remeber and/or google it.
>
> All above is IMHO and may be incorrect ))
>
>
>
> On Tue, Jun 29, 2010 at 7:01 PM, Daniele E. Domenichelli
> <daniele.domenichelli at gmail.com
> <mailto:daniele.domenichelli at gmail.com>> wrote:
>
> On 06/29/2010 12:57 PM, Artem Serebriyskiy wrote:
> > Can you please describe your idea in more detail?
> > What information is stored localy ?
> > And how to perform queries ? For example:
> > "Select all resources that have a label <string literal> "
> > and
> > " Select all [music]files that has a <NCO:Contact | where this
> nco:contact
> > is stored localy> as author".
>
>
> Well, I have 3 different scenarios in my mind:
>
>
> 1: during synchronization both resources and relations are copied:
>
> * resources on remote nepomuk server can be stored locally, but, in
> order to avoid conflicts between uri, the name can be replaced, for
> example:
> from: nepomuk:/<resource>
> to: telepathy:/contact/<name>/nepomuk/<resource>
> (or nepomuk:/telepathy/contact/<name>/<resource>)
>
> * relations can be copied modifying the name of the subject and
> of the
> object using the new name, for example
> from: nepomuk:/<resource1>
> <relation>
> nepomuk:/<resource2>
> to: telepathy:/contact/<name>/nepomuk/<resource1>
> <relation>
> telepathy:/contact/<name>/nepomuk/<resource2>
>
> In this way:
> * queries can be done locally even if the contact is offline
> * the uri for a resource will always be unequivocal (but it might
> require some relation to represent, for example that a resource
> representing a file on my pc corresponds to a resource that
> represents
> the same file on my contact's pc
> * When the contact is online you could use dbustubes to execute
> (and to
> listen for changes to) a specific query
>
>
> 2: Same as 1, but only resources are copied
>
> In this way:
> * queries executed when the contact is online can return results
> using
> both local and remote nepomuk server (using dbustubes to execute
> queries
> on the remote server), but queries executed when the contact is
> offline
> can return results using local server only.
> * The local database will contain less informations, so it will
> probably
> be smaller and faster, but the times for remote queries will be
> probably
> longer due to network latency
>
>
> 3: No synchronization at all
>
> In this way queries on remote server can be executed only if the
> contact
> is online.
>
>
>
> In all cases queries will just be a normal query that returns some
> resources of type "nepomuk:/" and some resources of type
> "telepathy:/contact/<name>/nepomuk/", but they might need to be
> executed
> both on local and on remote servers using dbustubes (and
> resource from
> remote servers must be modified to represent the name of the contact
> that created it)
>
>
>
> Cheers,
> Daniele
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org <mailto:Nepomuk at kde.org>
> https://mail.kde.org/mailman/listinfo/nepomuk
>
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org <mailto:Nepomuk at kde.org>
> https://mail.kde.org/mailman/listinfo/nepomuk
>
>
>
>
> --
> Sincerely yours,
> Artem
>
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
More information about the Nepomuk
mailing list