Why using Nepomuk as a contact store is probably not a good idea

Thu Jul 21 17:11:46 CEST 2011

Hello Paolo,

On 20/07/11 20:40, Paolo Capriotti wrote:
> The idea of RDF and related technologies is to have a unified format
> to express any sort of knowledge, with no limitations in scope and
> structure, so that an automated tool can reason about it, process it,
> and extract information.
>
> The basic assumption that makes such an abstraction worthwhile is that
> the data maintained in RDF*can*  actually be processed by tools that
> are agnostic to the particular type of data that is stored there.
>
> That is not the case for contacts. The way we can use contacts data
> stored in an RDF store is by writing*specific*  contact-related APIs.
> That completely negates the purpose of using a generic data store in
> the first place.

You are assuming that we are using telepathy just to extract 
*information about telepathy-contacts* and store them in Nepomuk, but 
what we are trying to achieve to extract from telepathy *knowledge about 
persons*.
Other parts of kde (akonadi for example) will do exactly the same 
extracting information of a different kind and storing them in Nepomuk 
as well.
This kind of information must be stored in an agnostic storage (because 
kde-telepathy won't know anything about smtp and accounts and akonady 
won't know anything about telepathy) and all this data must be available 
to any other part of the desktop.
But this doesn't mean that for kde-telepathy we don't have to use a 
library with the most common queries, etc. Of course we *could* get this 
data directly from telepathy, but since we are pushing that in Nepomuk 
anyway, because we want that this data can be used by any other program, 
I see no reason why we shouldn't use the data already in Nepomuk.

Even if a program doesn't use telepathy-kde library, You might want to 
access to telepathy data, so that from dolphin, for example, you can 
query for all the files that a certain person sent you, and you will get 
all the files that that person sent to you by IM, by mail etc., using an 
USB key, or whatever, without knowing anything about telepathy-kde.
Another program could add completely different information to a person, 
that means that you can write a program to associate the person with 
what he likes and make a query to get all the persons that like dogs 
whose colour is brown and that are currently online so that you can show 
them the picture of your beautiful brown dog. That's the idea behind RDF 
and related technologies... Knowledge instead of raw data.

It is not about data-agnostic tools, but about tools-agnostic tools, 
that means that a tool must know exactly what kind of data he is going 
to use, otherwise it could do nothing with it, except producing other 
data using reasoning and inference. At the same time it doesn't need to 
know anything about which other tools produce and use the same kind of 
information. nepomuk-telepathy-service doesn't know which other programs 
will use the data that he pushes into Nepomuk. At the same time the 
telepathy-kde library uses data into Nepomuk without knowing who added 
that data, but knows exactly *which* kind of data it needs.

> So here's what I propose as architecture:
>
> - Storage-agnostic API at the top level. Candidates: QtContact
> (http://doc.qt.nokia.com/qtmobility-1.2/contacts.html), or roll our
> own.
> - Contacts store. Candidates: libfolks
> (http://telepathy.freedesktop.org/wiki/Folks), or roll our own.

Since we want to offer this information in the whole deskop, using 
libfolks, from my point of view, just adding another layer of complexity 
to the system and a waste of resources. We could of course use it, but 
then we have to extract information to push in Nepomuk from there.
If you like libfolks and you would like to get the same "metacontacts" 
from empathy and from telepathy-kde, it could be possible to write a 
nepomuk-libfolks-service that synchronizes nepomuk storage with libfolks 
(in both directions), nobody is telling you not to do it... I think it 
could be interesting actually.

About QtContact I must admit that it could be interesting as well to 
investigate if it is worth writing a backend that uses telepathy-kde or 
directly nepomuk as data storage.

> 4) a sort of impedance mismatch: RDF is designed as a format to express factual
> truths, usually relatively static and not frequently subject to
> change; using it for data as dynamical as a live contact list is very
> questionable

Even if I don't agree with the other points, I kind of agree with 
this... Volatile data, in my opinion, like current presence, etc. should 
not be stored in Nepomuk directly, but this is not because we don't need 
the presence in Nepomuk, but because at the moment we are considering 
/volatile/ data what should actually be /permanent/ data.

For example the event "At 09:32:44 Mr. Foo presence changed from busy to 
available" should be stored in Nepomuk, so that it could be possible to 
make complex time-based queries. Zeitgeist is perhaps more indicated to 
handle and process this kind of events, and push them into Nepomuk.

Anyway, this definitely something that shouldn't be done now, maybe for 
the 3rd release ;) I'm quite happy at the moment about the way we store 
the presence.

On 20/07/11 23:30, Paolo Capriotti wrote:
> API  <---- Nepomuk <----- Telepathy
>   |                           ^
>   -----------------------------
>
> then I have to say it's even worse than I thought. You have a middle
> layer which works only in one direction, so, applications that want r/w
> access need to use two different APIs.

It's not exactly like this, I see it in this way:

API  <---- Nepomuk <----- nepomuk-telepathy-service <----- Telepathy
  |                                                             ^
  ---------------------------------------------------------------

That's exactly what it is meant to be: you (and telepathy itself) push 
raw data into telepathy, Nepomuk-Telepathy-Services takes raw data 
extracts knowledge (in the form of RDF triples) and pushes it into 
Nepomuk so that it is available to anyone. The telepathy-kde API offers 
this knowledge to the developers of KDE applications that use Telepathy, 
so that it is not necessary to rewrite the same queries every time in 
every application.
But if you are writing an application that uses pimo:person but not 
telepathy, or if you think that data from Nepomuk is not necessary, you 
can just use data from Nepomuk instead of using the API or use perhaps 
raw data from telepathy.

> Sorry for the long mail. :)
Sorry for the long reply. ;)

Cheers,
  Daniele