Why using Nepomuk as a contact store is probably not a good idea

Paolo Capriotti paolo.capriotti at collabora.co.uk
Thu Jul 21 00:30:57 CEST 2011


On 20/07/11 21:04, George Goldberg wrote:
>> 2) unavoidable hard dependency on Nepomuk, which is highly undesirable
>> for many people (me included, as a user)
>
> Well, this is the way KDE is going. If you don't like it, I suggest
> you go to kde-core-devel and suggest that the direction of KDE as a
> whole is completely changed.

It is my understanding that most of the Nepomuk integration happening in 
KDE is in the form of an optional dependency. The idea being that they 
dump their data in the Nepomuk store, so that potentially applications 
can consume it.
As far as I know, there is no other component, apart from those relevant 
to Nepomuk itself, that actually require Nepomuk for their basic 
functionality.

>
>> 3) need to perform back and forth synchronization between the data
>> store and telepathy (and possibly future data sources, if we want to
>> add them)
>
> And this is a problem because? Note that as I explain further down,
> nepomuk data is only synced to nepomuk, there is no writeback.

It is a problem because synchronization is hard, and there are many ways 
to get it wrong, and very few to get it right :)
I didn't know writeback was not supported (see my concerns below). I 
suppose that makes this issue less relevant.
Still, if you want to support adding local information to contacts 
(which seems to be one of the most compelling use-cases of the whole 
nepomuk thing), I don't see how you can avoid it.

>> 4) a sort of impedance mismatch: RDF is designed as a format to express factual
>> truths, usually relatively static and not frequently subject to
>> change; using it for data as dynamical as a live contact list is very
>> questionable
>
> Sounds like you aren't that familiar with the goals of Nepomuk, even
> if you know far more than me about what the designers of RDF intended
> it for. Please go talk to the Nepomuk devs more about what they are
> trying to acheive.

I know what Nepomuk is about. I have to admit that I'm not familiar with 
the technical details of the KDE incarnation of it, but I am familiar 
with RDF and related technologies.
This, however, is not about Nepomuk. I have respect for Nepomuk's 
vision, and, even though I'm sort of oblivious to all the buzz that 
surrounds it, I understand the implications and I'm not trying to say 
that data shouldn't be pushed into Nepomuk at all, as I've stated already.
What I am against is Nepomuk used as an *internal* architectural 
component. I think that's a problematic decision, and I see no merit in it.

>> 5) concurrency issues (think of clients running on different machines)
>
> Please elaborate. I don't understand what you are implying here.

Not relevant if you're not planning to offer a writable API, but the 
scenario I had in mind was a user making conflicting changes on two 
different machines (just one of the two needs to be through kde-telepathy).
Normally, the telepathy CM would take care of serializing the changes, 
but if you introduce a middleware component, things are not so clear any 
more.

>> 6) all the usual consistency problems of using an intermediate layer
>> as cache, but without the freedom of manipulating it any way you want,
>> because other applications can access it
>
> But with all the advantages of making information available to all the
> other KDE applications which might want it...

No, that's not unique of a Nepomuk-based approach. The solution I 
proposed would have this feature. I wouldn't have considered it otherwise.

The difference is that it doesn't hold a copy of the data, it just 
aggregates it and exposes it through a convenient API.

>> 7) tons of potentially unsolvable performance problems, usually
>> related to point 4
>
> Such as? Please don't just spout the same old "nepomuk is shit" stuff.
> Substantiate these kind of claims if you want them to be taken
> seriously.

Please don't be unfair. As I wrote before, I tried to keep each point 
short, because my mail was getting quite long. I can of course elaborate.

The performance problems I'm talking about are nothing new, and have 
been mentioned before. The fact that they can be solved is irrelevant (I 
already pointed out that everything *can* be solved, that doesn't make 
this a good solution).

To be specific, I'm referring to:
- the obvious observation that using an ultra-generic API for a specific 
task that requires no generality imposes overhead, no matter how good 
the implementation
- extremely volatile data getting stored to and fetched from disk
- no control over what is happening in the lower layer, which is too 
general to be aware of potential domain specific optimizations (both in 
terms of space and time)

Can those issues be solved or sidestepped? Maybe. But it's yet another 
reason for me to be sceptical about this solution, since there are 
alternatives which have no comparable danger of turning out infeasible 
performance-wise, and offer *identical* benefits from a user's perspective.

>> 8) all the use cases I can think of can be covered anyway: if
>> there is a need to have contact data available in nepomuk for
>> "semantic" applications to consume, there's nothing that would prevent
>> to export it even if we're using another data layer internally. I'm
>> talking about a simple one-way synchronization mechanism that doesn't
>> need to be real-time or efficient. However, I doubt there would be any
>> rational reasons for an application to prefer an RDF API to the domain
>> specific API that we would provide.
>
> So, this seems to indicate a misunderstanding of what KDE-Telepathy is
> trying to acheive with Nepomuk integration. At this stage, there are
> *no* plans to have any kind of writeback via nepomuk to Telepathy. If
> you want to write data, use TpQt4. If you just want to consume it, use
> Nepomuk.

As I mentioned before, I'm not sure this makes much sense.
First, you need to provide write access for many use cases which don't 
relate directly to telepathy, even basic features such as manually 
merging contacts or adding local properties.

Second, if the architecture is really like this:

API  <---- Nepomuk <----- Telepathy
  |                           ^
  -----------------------------

then I have to say it's even worse than I thought. You have a middle 
layer which works only in one direction, so, applications that want r/w 
access need to use two different APIs.

More importantly, you have all those sorts of problems where you use 
telepathy to change something, and then expect the change to be 
propagated into Nepomuk, but what exactly happens depends on the 
Telepathy -> Nepomuk conversion.

> You want to NIH Nepomuk (which is what the rest of KDE is working to
> support), and you justify it with the code reuse argument?!

I'm not sure I understand your usage of "NIH". I'm not proposing to 
rewrite Nepomuk.
I'm proposing to use *another* solution, which is more apt to solving 
the problem, in my opinion.
I'm talking about code reuse in a cross-desktop sense. I claim that 
there is a lot less code to write, because components like libfolks 
address our particular use cases (e.g. contact merging, aggregating data 
from tp...), while Nepomuk is more general, and not strictly concerned 
with contacts or real time communication.

>> 2) lots of work already done, tested, and used in production in other project
>
> I find that pretty offensive given I've been working on Telepathy in
> KDE for 4.5 years now in it's various incarnations.

It's not my intention to offend anyone. The time spent on a project 
doesn't count, what counts are the finished products. In Telepathy KDE, 
the only parts that resemble finished products are those that deal 
directly with Telepathy.

>> 5) no user-visible or "politically-loaded" dependency
>
> Not interested in getting into an argument about this. Just look at
> things like plasma active, KDE PIM etc and tell me that we're going
> out on a limb here.

Please forgive me if what I say isn't entirely correct, but I believe 
KDE PIM has a custom storage infrastructure for their data (Akonadi) and 
only exports data to Nepomuk for indexing/searching purposes.

That's a perfectly sensible architecture, and it mirrors exactly what I 
am advocating for KDE Telepathy.

As for plasma active, I have no idea. In any case, what they choose to 
use shouldn't affect us. A bad technical decision isn't justifiable with 
an argument about the community.

> In summary, I think that part of your concerns are addressed by a
> misunderstanding of the way KDE Telepathy and Nepomuk interact - it is
> read only from the applications point of view. They should be using
> TpQt4 if they want to modify stuff (at least at this stage, although
> we haven't considered any kind of write-back from Nepomuk yet). Also,
> I think you need to read a bit more about how Nepomuk is being used in
> KDE, whether or not this fits with the original intended goals of RDF
> is not really relevant. Also, please remember that there are a lot of
> advantages to us integrating with Nepomuk that you have completely
> omitted to acknowledge - see my various blog posts over the years, and
> e.g. Martin's Summer of Code project, and the other work going on in
> KDE PIM at the moment.

Hopefully, by now it's clear that I don't want to exclude Nepomuk 
completely. I'm just saying that it shouldn't be an inner part of the 
architecture, just a write-only leaf in our dependency graph.

BR,
Paolo


More information about the KDE-Telepathy mailing list