Why using Nepomuk as a contact store is probably not a good idea

Thu Jul 21 01:24:05 CEST 2011

On Thu, Jul 21, 2011 at 00:30, Paolo Capriotti <
paolo.capriotti at collabora.co.uk> wrote:

> On 20/07/11 21:04, George Goldberg wrote:
> >> 2) unavoidable hard dependency on Nepomuk, which is highly undesirable
> >> for many people (me included, as a user)
> >
> > Well, this is the way KDE is going. If you don't like it, I suggest
> > you go to kde-core-devel and suggest that the direction of KDE as a
> > whole is completely changed.
>
> It is my understanding that most of the Nepomuk integration happening in
> KDE is in the form of an optional dependency. The idea being that they
> dump their data in the Nepomuk store, so that potentially applications
> can consume it.
> As far as I know, there is no other component, apart from those relevant
> to Nepomuk itself, that actually require Nepomuk for their basic
> functionality.
>

In that case, look at it as we're the ones pushing Nepomuk further ;) To be
honest, I'm quite proud of it. I see lots of potential in this once the
desktop is fully semantic (Plasma Active is the closest to that currently).

>
> >
> >> 3) need to perform back and forth synchronization between the data
> >> store and telepathy (and possibly future data sources, if we want to
> >> add them)
> >
> > And this is a problem because? Note that as I explain further down,
> > nepomuk data is only synced to nepomuk, there is no writeback.
>
> It is a problem because synchronization is hard, and there are many ways
> to get it wrong, and very few to get it right :)
> I didn't know writeback was not supported (see my concerns below). I
> suppose that makes this issue less relevant.
> Still, if you want to support adding local information to contacts
> (which seems to be one of the most compelling use-cases of the whole
> nepomuk thing), I don't see how you can avoid it.
>

I didn't get this part. Sync is not really that hard - you get a changed()
signal from tp-qt4, you catch it and push the changed data in nepomuk, that
change itself triggers another changed() signal which then gets caught in
the models. I don't see any big problems here. If the service/Nepomuk is not
running for any reason, it will resync once it's back online (but it doesn't
really matter as other apps won't be able to access that data without
Nepomuk running anyway, so we're good).

>
> >> 4) a sort of impedance mismatch: RDF is designed as a format to express
> factual
> >> truths, usually relatively static and not frequently subject to
> >> change; using it for data as dynamical as a live contact list is very
> >> questionable
> >
> > Sounds like you aren't that familiar with the goals of Nepomuk, even
> > if you know far more than me about what the designers of RDF intended
> > it for. Please go talk to the Nepomuk devs more about what they are
> > trying to acheive.
>
> I know what Nepomuk is about. I have to admit that I'm not familiar with
> the technical details of the KDE incarnation of it, but I am familiar
> with RDF and related technologies.
> This, however, is not about Nepomuk. I have respect for Nepomuk's
> vision, and, even though I'm sort of oblivious to all the buzz that
> surrounds it, I understand the implications and I'm not trying to say
> that data shouldn't be pushed into Nepomuk at all, as I've stated already.
> What I am against is Nepomuk used as an *internal* architectural
> component. I think that's a problematic decision, and I see no merit in it.
>

This is only the first step. Once there are more data in Nepomuk, you'll get
some awesome features. For example, I'm currently working on merging PIM
stuff into PIMO:Person (that's what kde-tp uses as contacts) and once done,
we can have PIM features in kde-tp (ie. you can email people from our
contact list) or likewise have IM features in PIM (see their presence in
KMail etc). Think of the long-term here.

>
> >> 5) concurrency issues (think of clients running on different machines)
> >
> > Please elaborate. I don't understand what you are implying here.
>
> Not relevant if you're not planning to offer a writable API, but the
> scenario I had in mind was a user making conflicting changes on two
> different machines (just one of the two needs to be through kde-telepathy).
> Normally, the telepathy CM would take care of serializing the changes,
> but if you introduce a middleware component, things are not so clear any
> more.
>

I think you are also missing one important detail of our implementation. We
use Nepomuk for storing your contact list. That list itself is read-only.
You can't really change anything about them, except maybe groups they're in
or their subscription state. Sure, you could rename your contacts to
something more sensible (does tp-qt4 even allow for that?), but that's
pretty much all the needed changes I can see.

>
> >> 6) all the usual consistency problems of using an intermediate layer
> >> as cache, but without the freedom of manipulating it any way you want,
> >> because other applications can access it
> >
> > But with all the advantages of making information available to all the
> > other KDE applications which might want it...
>
> No, that's not unique of a Nepomuk-based approach. The solution I
> proposed would have this feature. I wouldn't have considered it otherwise.
>
> The difference is that it doesn't hold a copy of the data, it just
> aggregates it and exposes it through a convenient API.
>
> >> 7) tons of potentially unsolvable performance problems, usually
> >> related to point 4
> >
> > Such as? Please don't just spout the same old "nepomuk is shit" stuff.
> > Substantiate these kind of claims if you want them to be taken
> > seriously.
>
> Please don't be unfair. As I wrote before, I tried to keep each point
> short, because my mail was getting quite long. I can of course elaborate.
>
> The performance problems I'm talking about are nothing new, and have
> been mentioned before. The fact that they can be solved is irrelevant (I
> already pointed out that everything *can* be solved, that doesn't make
> this a good solution).
>

The performance has been greatly improved over the last few releases, even
more for 4.7. I don't think this is a big problem these days. In fact, I
don't see it as a problem at all (and I work with Nepomuk heavily everyday,
trust me, I know what I'm talking about ;)

>
> To be specific, I'm referring to:
> - the obvious observation that using an ultra-generic API for a specific
> task that requires no generality imposes overhead, no matter how good
> the implementation
> - extremely volatile data getting stored to and fetched from disk
> - no control over what is happening in the lower layer, which is too
> general to be aware of potential domain specific optimizations (both in
> terms of space and time)
>
> Can those issues be solved or sidestepped? Maybe. But it's yet another
> reason for me to be sceptical about this solution, since there are
> alternatives which have no comparable danger of turning out infeasible
> performance-wise, and offer *identical* benefits from a user's perspective.
>
> >> 8) all the use cases I can think of can be covered anyway: if
> >> there is a need to have contact data available in nepomuk for
> >> "semantic" applications to consume, there's nothing that would prevent
> >> to export it even if we're using another data layer internally. I'm
> >> talking about a simple one-way synchronization mechanism that doesn't
> >> need to be real-time or efficient. However, I doubt there would be any
> >> rational reasons for an application to prefer an RDF API to the domain
> >> specific API that we would provide.
> >
> > So, this seems to indicate a misunderstanding of what KDE-Telepathy is
> > trying to acheive with Nepomuk integration. At this stage, there are
> > *no* plans to have any kind of writeback via nepomuk to Telepathy. If
> > you want to write data, use TpQt4. If you just want to consume it, use
> > Nepomuk.
>
> As I mentioned before, I'm not sure this makes much sense.
> First, you need to provide write access for many use cases which don't
> relate directly to telepathy, even basic features such as manually
> merging contacts or adding local properties.
>

There is such. Not currently implemented in kde-tp, but I'm myself working
on a tool that will enable you to merge whichever contacts you want. Even
including properties. However this will be part of KDE PIM. Nevertheless,
should the need arise, we can always add that directly into kde-tp. But the
long term vision tells me it will happen elsewhere.

>
> Second, if the architecture is really like this:
>
> API  <---- Nepomuk <----- Telepathy
>  |                           ^
>  -----------------------------
>
> then I have to say it's even worse than I thought. You have a middle
> layer which works only in one direction, so, applications that want r/w
> access need to use two different APIs.
>

Well not really like that, but yes, two APIs - you have one API to read data
- Nepomuk - and second API to write data - Telepathy. So the arch is really
something like this:

API  <---- Nepomuk <----- Telepathy
 |                                              ^
 ------------------------------------------------

Writing the data in Telepathy will always result in writing the data back to
Nepomuk for free. So we're in sync as I stated before.

>
> More importantly, you have all those sorts of problems where you use
> telepathy to change something, and then expect the change to be
> propagated into Nepomuk, but what exactly happens depends on the
> Telepathy -> Nepomuk conversion.
>

Once again - there's not much to change anyway. And even the little works as
we want it to work.

>
> > You want to NIH Nepomuk (which is what the rest of KDE is working to
> > support), and you justify it with the code reuse argument?!
>
> I'm not sure I understand your usage of "NIH". I'm not proposing to
> rewrite Nepomuk.
> I'm proposing to use *another* solution, which is more apt to solving
> the problem, in my opinion.
> I'm talking about code reuse in a cross-desktop sense. I claim that
> there is a lot less code to write, because components like libfolks
> address our particular use cases (e.g. contact merging, aggregating data
> from tp...), while Nepomuk is more general, and not strictly concerned
> with contacts or real time communication.
>

I see that as our advantage - libfolks will get you a list of folks, ok,
cool. But with Nepomuk we can do all crazy stuff - especially now with the
Nepomuk enabled activity-manager. You'll get all sorts of desktop events and
whatnot stored in Nepomuk and thanks to our implementation also relations to
your contacts. So for example you can see when did your contact sent you a
file or called you or whatever. Try to see the big picture here. Especially
with environments like Plasma Active, that is build entirely on Nepomuk. And
we're the ones pushing this integration forward if we're currently the only
ones using it. Other will come and expand that, I'm sure of it.

>
> >> 2) lots of work already done, tested, and used in production in other
> project
> >
> > I find that pretty offensive given I've been working on Telepathy in
> > KDE for 4.5 years now in it's various incarnations.
>
> It's not my intention to offend anyone. The time spent on a project
> doesn't count, what counts are the finished products. In Telepathy KDE,
> the only parts that resemble finished products are those that deal
> directly with Telepathy.
>

Whoa, now you did it again :P That's because Nepomuk side of things was
finished not a long time ago (two-three weeks?) and even now there are still
commits and fixes coming in, while the Telepathy implementation was set to
be released on 17th May (waaaay before the Nepomuk thing was even close to
finished). So that's why it is not finished.

>
> >> 5) no user-visible or "politically-loaded" dependency
> >
> > Not interested in getting into an argument about this. Just look at
> > things like plasma active, KDE PIM etc and tell me that we're going
> > out on a limb here.
>
> Please forgive me if what I say isn't entirely correct, but I believe
> KDE PIM has a custom storage infrastructure for their data (Akonadi) and
> only exports data to Nepomuk for indexing/searching purposes.

> That's a perfectly sensible architecture, and it mirrors exactly what I
> am advocating for KDE Telepathy.
>
> As for plasma active, I have no idea. In any case, what they choose to
> use shouldn't affect us. A bad technical decision isn't justifiable with
> an argument about the community.
>

Watch out here. You haven't proved it is a /bad/ technical decision, so far
you've only showed us that there are better solutions. I agree there are
more solutions to our problems, some even fits better into our needs. But
yet again, we decided to choose Nepomuk because of the bigger picture here
(I'm getting tired of writing this again and again :P)

>
> > In summary, I think that part of your concerns are addressed by a
> > misunderstanding of the way KDE Telepathy and Nepomuk interact - it is
> > read only from the applications point of view. They should be using
> > TpQt4 if they want to modify stuff (at least at this stage, although
> > we haven't considered any kind of write-back from Nepomuk yet). Also,
> > I think you need to read a bit more about how Nepomuk is being used in
> > KDE, whether or not this fits with the original intended goals of RDF
> > is not really relevant. Also, please remember that there are a lot of
> > advantages to us integrating with Nepomuk that you have completely
> > omitted to acknowledge - see my various blog posts over the years, and
> > e.g. Martin's Summer of Code project, and the other work going on in
> > KDE PIM at the moment.
>
> Hopefully, by now it's clear that I don't want to exclude Nepomuk
> completely. I'm just saying that it shouldn't be an inner part of the
> architecture, just a write-only leaf in our dependency graph.
>

If the real world usage will show us it was a bad decision and that it
doesn't really work, I'm willing to fully acknowledge you were right and we
were idiots. But from my experience so far, it works perfectly fine. Sorry,
but I remain unconvinced. You posed some valid theoretical points, but the
practice speaks different so far...

--
Marty K.

>
> BR,
> Paolo
> _______________________________________________
> KDE-Telepathy mailing list
> KDE-Telepathy at kde.org
> https://mail.kde.org/mailman/listinfo/kde-telepathy
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/kde-telepathy/attachments/20110721/da2af809/attachment-0001.htm