Nepomuk in 4.13 and beyond

Vishesh Handa me at vhanda.in
Thu Dec 12 10:46:53 UTC 2013


Hey everyone

During the KDE 4.11 cycle Nepomuk reached a maturity level that we were 
happy 
with, it is reasonably fast, stable, and unless used together with Akonadi it 
is no longer the "CPU consumer" it was before. We reached this state after 
years of analyzing what was wrong and what could be improved to the 
point 
where we no longer think any more improvement is possible only by 
modifying 
our code.

The next place where we could seek improvement was the RDF storage. We 
have 
been using Virtuoso for about 4 years and it's been a game changer for us 
performing way faster than any other of the alternatives we ever used 
before 
and more efficiently, but as many of you know (and others suffer) it is not a 
RDF 
storage designed for the desktop and it will never be. Since nothing better 
than Virtuoso exists for our use-case, we started to implement our own 
RDF storage mechanism (codenamed Vishuoso).

At some point during that progress we took a step back and re-analyzed 
the 
problems of the workspace and the current implementation. The problems 
are -

- Resource Description Framework (RDF)
The biggest problem with RDF is that it raises the knowledge needed to 
contribute to a point where most developers decide to to skip it. After all 
these years only a handful of brave developers have worked with it and the 
experience hasn't been good.

Then we need something easier to use so we can see a more broad 
adoption.

Additionally, RDF is a very flexible way to store data, it is however not the 
most efficient way. Data is generally completely normalized even though it 
is 
quite often not required. Eg - One doesn't need to store music file artists 
as 
a separate contact. This is great, from a theoretical point of view, but it is 
not very useful in practice.

- RDF Storage
There is no existing RDF storage designed to work in a Desktop. Virtuoso 
is 
great but it quickly uses hundreds of megabytes of ram and it has its own 
share of problems. The other alternative is tracker, but they lack certain 
features required in Nepomuk.

- Data duplication
Nepomuk has been used as both a search store and a data store. This 
results in 
massive data duplication and synchronization problems. In the case of 
Akonadi, 
emails are stored in Akonadi and are then duplicated in virtuoso, and are 
then 
duplicated in virtuoso's index. Every time data is changed in Akonadi it has 
to be updated in Nepomuk and vice-versa. This results in a process being 
responsible just for synchronizing the two stores.

- API Duplication
With the data residing in both Nepomuk and other stores 
(Akonadi/Files/etc), 
it isn't always clear which store it should be accessed from. This 
essentially 
results in duplication of APIs. Eg - Using KABC vs accessing contacts from 
Nepomuk.

These problems would still exist even if we had the fastest and most 
efficient 
RDF storage in the world.

At this point it was clear to us that the future was not going to be RDF. The 
next thing we did was to analyze our actual needs based on the last 5 
years of 
Nepomuk.

Our needs are -
* Full text index for searching
* Data store for simple objects such as tags / ratings / activities / etc
* Relations - Forming relations between different objects. Eg - This "file" is 
related to this "activity" or "person".
 
Each of these problems is independently solvable without RDF.

About 2 months ago we started to draft Baloo [1], a metadata solution 
that 
will cover the bare necessities of each use case we have. 

I'd like to avoid getting into the technical details of the implementation in 
this thread. Another thread can be started about its different aspects 
once 
you've read the basic ideas behind Baloo [2]

Current Plans
---------------------

After a month of designing the solution and a month of implementing it, 
Baloo 
is working way better than Nepomuk does. So, I'd like to switch to Baloo by 
default in 4.13, while keeping Nepomuk in maintenance mode for more 
conservative distributions.

This is not a completely new project as large parts of Baloo code are 
derived from Nepomuk and therefore comes with years of testing and real 
world use.
 
Baloo was also discussed in PIM Sprint and the PIM developers are happy 
to 
completely drop Nepomuk support for 4.13 and move to Baloo. Similarly, 
the 
telepathy developers are also working on moving KPeople away from 
Nepomuk.

Migration - There will be an automated migration mechanism for migrating 
tags, 
ratings and comments from Nepomuk to Baloo.

Trying it out?
-------------------

Developers are welcome to try out Baloo and have a look at the current 
source 
code. It's a still a work in progress, but we strongly feel that it is a step 
in the right direction.

I'd recommend using Milou [3] for searching.

-- 
Vishesh Handa

[1] https://projects.kde.org/projects/playground/base/baloo
[2] http://techbase.kde.org/Projects/Baloo
[3] https://projects.kde.org/projects/playground/base/milou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/plasma-devel/attachments/20131212/84e40585/attachment-0001.html>


More information about the Plasma-devel mailing list