<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, May 27, 2013 at 2:28 PM, Sebastian Trüg <span dir="ltr"><<a href="mailto:sebastian@trueg.de" target="_blank">sebastian@trueg.de</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 05/26/2013 12:58 AM, Vishesh Handa wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hey guys<br>

<br>

I have made a very important discovery - The Storage service is a big<br>

bottleneck!<br>

<br>

Running a query such as - 'select * where { graph ?g { ?r ?p ?o. } }<br>

LIMIT 50000' by directly connecting to virtuoso via ODBC takes about<br>

2.65 seconds. In contrast running the same query by using the Nepomuk<br>

ResourceManager's main model takes about 19.5 seconds.<br>

<br>

Nepomuk internally uses the Soprano::LocalSocketClient to connect to the<br>

storage service which runs a Soprano::LocalServer.<br>

<br>

I've been trying to optimize this Soprano code for some time now and<br>

from 4.9 we have a good 200% performance increase. But we can increase<br>

it a LOT more by just directly communicating with virtuoso.<br>

<br>

Pros -<br>

* 6-8x performance upgrade<br>

* The storage service isn't using such high cpu when reading<br>

* Accurate reporting - Suppose app 'x' does a costly query which<br>

requires a large number of results, then 'x' will have high cpu<br>

consumption. Currently both NepomukStorage and 'x' have very high cpu<br>

consumption.<br>

<br>

Cons -<br>

* Less Control - By having all queries go through the Nepomuk Storage we<br>

could theoretical build amazing tools to tell us which query is<br>

executing and how long it is taking. However, no such tool has ever been<br>

written - so we won't be loosing anything.<br>

<br>

Before 4.10 this could never have been done because we used to have a<br>

lot of code in the storage service which handled removable media and<br>

other devices. This code would often modify the sparql queries and<br>

modify the results. With 4.10, I threw away all that code.<br>

<br>

Comments?<br>

<br>

PS: This is only for read only operations. All writes should still go<br>

through the storage service. Though maybe we want to change that as well?<br>

</blockquote>

<br></div></div>

My 2 cents:<br>

<br>

You could even do this for write operations but then you would need clients to always use a client library which does all the checks and notifications. I suppose this is fine but of course requires to for example write a python lib. Alternatively you could support both: direct ODBC writes via C++, slower writes via the server (internally using the C++ client lib) for everyone else (for example scripts).<br>


<br>

All in all it seems like a good idea. I always liked the modular system with the storage service, but let's face it: it's a performance drain and in the end does not give us much besides a nice design.<br></blockquote>

<div><br></div><div>Doing it for writes seems a little messy right now. Integrating the ResourceWatcher is going to be hard.<br><br></div><div>I'm going to push my changes to remove the LocalServer and LocalClient. This is only for reads, since clients shouldn't be writing raw sparql to insert stuff.<br>

 <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Cheers,<br>

Sebastian<div class="HOEnZb"><div class="h5"><br>

______________________________<u></u>_________________<br>

Nepomuk mailing list<br>

<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/<u></u>listinfo/nepomuk</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><span style="color:rgb(192,192,192)">Vishesh Handa</span><br>

</div></div>