[Kde-pim] Re: Architecture: Akonadi, Nepomuk, SPARQL

Mon Apr 25 15:11:13 BST 2011

On Friday 22 April 2011 23:41:13 Michael Schuerig wrote:
> On my quest to export contacts matching some criteria, I read around a
> bit in blog posts, presentations and Robert Zwerus's thesis on Akonadi.
> 
> At first my understanding was, that Akonadi treats the items (payload)
> it stores as opaque, apart from a little generic metadata. This layer
> uses a relational database (MySQL by default) to store its data.
> 
> But, I was wondering, how can I ever hope to efficiently query this data
> for items matching my criteria? As it turns out, there's another layer
> for indexing/searching on top of the storage layer: Nepomuk. Presumably
> Nepomuk, by means of strigi, parses and indexes the Akonadi payload data
> and stores its result in a Virtuoso database. Virtuoso in turn quickly
> responds to queries formulated in SPARQL.

Correct.

> There are a few things about this picture that I don't understand,
> assuming I've got the picture correctly. The first thing is why there
> need to be two separate database servers when Virtuoso itself is a quite
> capable relational/SQL database? Storing related data consistently in
> two databases instead of just one surely doesn't make things easier.

It started this way due to historical reasons, Nepomuk wasn't using Virtuoso 
back then for example. During Akademy last year we tried to get Akonadi 
working with Virtuoso as its database backend. It turned out though that it's 
not that easy, the SQL dialects are too different. We got to the point where 
Akonadi was able to connect to Virtuoso and with some hacks even create its 
tables but still failed when trying to write its data. Nothing really hard, 
but probably a week or two of work that needs to be done.

However, just using Virtuoso this way will not magically solve anything, we 
need both ways, bit-perfect retrieval of the payload and (full text/semantic) 
indexing. Inherently, this will roughly duplicate the amount of stored data, 
no matter if you use one or two databases (in theory, in practice using the 
same database can of course have some synergies in resource usage etc, that's 
why we tried it).

> Opacity vs semantics. The approach seems to be to first store data
> opaquely, i.e., with no model of what the data may be. Then let the
> strigi analyzers have a go at the blobs and mould them into things
> recognizable as messages, contacts, or calendar events. In other words,
> for the payload data to be useful and to be easily queriable in
> particular, it has to conform to some model (or schema or ontology).
> Then why not store it in such a "semantically rich" way to begin with?

For something like contacts you might be able to reassemble the original data 
structure from data in Nepomuk, but this doesn't work for emails. For things 
like cryptographic signatures you basically need a bit-perfect recreation of 
the original payload data to verify the signature.

> I can easily imagine how an architecture can grow to look like this. But
> with hindsight, is this really the best way to fulfill the current
> requirements?

Much of this is historically grown of course, and some things can certainly be 
improved, such as the use of two different database servers. But in general, 
all those components have their use-cases.

What we will probably end up with mid- to long-term (keep in mind this all 
under development), it something like this:

- Akonadi provides a unified way to access data from remote sources, and takes 
care of caching, change replay etc.
- Akonadi payloads are fed into Nepomuk, providing a ~1:1 mapping of the 
payloads to a semantic representation.
- Those Nepomuk items are aggregated into higher-level concepts (persons, 
conversations, etc), not limited to items provided by Akonadi
- Changes made to Nepomuk objects on both levels are propagated back to the 
corresponding sources (as far as possible)

Depending on the use-case of an application you work on one of those levels. 
While e.g. the address book probably should work on aggregated persons, a 
phone sync tool might need lower-level information to know what's already on 
the phone.

This looks quite complex, but managing contacts is not as easy as 15 years ago 
anymore, now that you have a dozen different sources for information about the 
same person :)

regards
Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20110425/85886d7b/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/