[Nepomuk] [RFC] Avoid communicating through the Nepomuk Storage

Vishesh Handa me at vhanda.in
Sat May 25 22:58:01 UTC 2013


Hey guys

I have made a very important discovery - The Storage service is a big
bottleneck!

Running a query such as - 'select * where { graph ?g { ?r ?p ?o. } } LIMIT
50000' by directly connecting to virtuoso via ODBC takes about 2.65
seconds. In contrast running the same query by using the Nepomuk
ResourceManager's main model takes about 19.5 seconds.

Nepomuk internally uses the Soprano::LocalSocketClient to connect to the
storage service which runs a Soprano::LocalServer.

I've been trying to optimize this Soprano code for some time now and from
4.9 we have a good 200% performance increase. But we can increase it a LOT
more by just directly communicating with virtuoso.

Pros -
* 6-8x performance upgrade
* The storage service isn't using such high cpu when reading
* Accurate reporting - Suppose app 'x' does a costly query which requires a
large number of results, then 'x' will have high cpu consumption. Currently
both NepomukStorage and 'x' have very high cpu consumption.

Cons -
* Less Control - By having all queries go through the Nepomuk Storage we
could theoretical build amazing tools to tell us which query is executing
and how long it is taking. However, no such tool has ever been written - so
we won't be loosing anything.

Before 4.10 this could never have been done because we used to have a lot
of code in the storage service which handled removable media and other
devices. This code would often modify the sparql queries and modify the
results. With 4.10, I threw away all that code.

Comments?

PS: This is only for read only operations. All writes should still go
through the storage service. Though maybe we want to change that as well?

-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130526/93a7adbf/attachment.html>


More information about the Nepomuk mailing list