<br><div class="gmail_quote">On Mon, Mar 7, 2011 at 10:09 AM, Sebastian Trüg <span dir="ltr">&lt;<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I forgot one thing:<br>

<br>

about the web interface. Nepomuk is running Virtuoso in lite mode. To<br>

disable that you need to hack Soprano and comment line 328 in<br>

soprano/backends/virtuoso/virtuosocontroller.cpp.<br>

Sadly there is no configuration option for that at the moment.<br></blockquote><div><br></div><div>All I know about compiling in linux is:</div><div>1) Download and uncompress tgz.</div><div>2) ./configure &amp;&amp; make &amp;&amp; sudo make install</div>


<div><br></div><div>Could Soprano be compiled with my skills? Where I can download the tgz?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Cheers,<br>


Sebastian<br>

<div><br>

On 03/06/2011 02:33 PM, Ignacio Serantes wrote:<br>

&gt; On Wed, Mar 2, 2011 at 12:24 PM, Sebastian Trüg &lt;<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a><br>

</div><div>&gt; &lt;mailto:<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a>&gt;&gt; wrote:<br>

&gt;<br>

&gt; Hi!<br>

&gt;<br>

&gt; Sorry for the lag but I&#39;m really busy.<br>

&gt;<br>

&gt;     On 03/02/2011 11:57 AM, Ignacio Serantes wrote:<br>

&gt;     &gt; On Wed, Mar 2, 2011 at 9:55 AM, Sebastian Trüg &lt;<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a><br>

&gt;     &lt;mailto:<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a>&gt;<br>

</div><div><div></div><div>&gt;     &gt; &lt;mailto:<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a> &lt;mailto:<a href="mailto:trueg@kde.org" target="_blank">trueg@kde.org</a>&gt;&gt;&gt; wrote:<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;     &gt; This is worrying because many problems can&#39;t be solved with brutal<br>

&gt;     &gt; queries with brutal inner joins. This stuff must be added if you want<br>

&gt;     &gt; that nepomuk db works with a limited amount of RAM and CPU.<br>

&gt;<br>

&gt;     Please make suggestions.<br>

&gt;<br>

&gt;<br>

&gt;     &gt; The story of my life, I&#39;m always be an step before a working<br>

&gt;     nepomuk :).<br>

&gt;     &gt; Can you explain me a little bit how are this final optimizations?<br>

&gt;<br>

&gt;     A better visibility check directly on the resource. This also requires a<br>

&gt;     nao:userVisible property on all resources which is done by a new class.<br>

&gt;<br>

&gt;<br>

&gt; Great!, precisely this is the big problem I found with your queries and<br>

&gt; you resolved adding a flag. Please, check the following simple queries<br>

&gt; with one string tests. Strings are selected to obtain result sets with<br>

&gt; different size.<br>

&gt;<br>

&gt; Test #01 [    API_H1,       (dorama)]:    93 results in   0.06073117<br>

&gt; seconds<br>

&gt; Test #02 [    API_H2,       (dorama)]:    93 results in   1.84762883<br>

&gt; seconds<br>

&gt; Test #03 [    API_H3,       (dorama)]:    93 results in   2.01492691 seconds<br>

&gt; Test #04 [    API_H4,       (dorama)]:    93 results in   1.91813993 seconds<br>

&gt; Test #05 [    API_H5,       (dorama)]:    93 results in   1.60597801 seconds<br>

&gt; Test #06 [       API,       (dorama)]:    93 results in   2.23234606 seconds<br>

&gt; Test #07 [    API_H1,    (ha ji won)]:     2 results in   0.01688004<br>

&gt; seconds<br>

&gt; Test #08 [    API_H2,    (ha ji won)]:     2 results in   5.00310612 seconds<br>

&gt; Test #09 [    API_H3,    (ha ji won)]:     2 results in   4.11244702 seconds<br>

&gt; Test #10 [    API_H4,    (ha ji won)]:     2 results in   3.95332003 seconds<br>

&gt; Test #11 [    API_H5,    (ha ji won)]:     2 results in   3.90592694 seconds<br>

&gt; Test #12 [       API,    (ha ji won)]:     2 results in  11.18835807 seconds<br>

&gt; Test #13 [    API_H1,        (music)]:  5434 results in   1.39107108 seconds<br>

&gt; Test #14 [    API_H2,        (music)]:  5420 results in   4.74685478 seconds<br>

&gt; Test #15 [    API_H3,        (music)]:  5420 results in   5.27211499 seconds<br>

&gt; Test #16 [    API_H4,        (music)]:  5420 results in   5.14297509 seconds<br>

&gt; Test #17 [    API_H5,        (music)]:  5420 results in   4.99993420 seconds<br>

&gt; Test #18 [       API,        (music)]:  5420 results in   9.88414216 seconds<br>

&gt;<br>

&gt; The query is the same but I applied minor changes:<br>

&gt; API_H1: is the same query without visibility inner join so easy to<br>

&gt; wonder where is the main performance problem.<br>

&gt; API_H2: is the query using a subquery and not an inner join. Query is<br>

&gt; equivalent (A U B) X C = (A X C) U (B X C). Because subquery is the same<br>

&gt; could be optimized by query optimizer and, because intermediate joins<br>

&gt; result sets are small union is fast. This is easy to see with a simple<br>

&gt; SELECT DISTINCT * in both queries and comparing result sets.<br>

&gt; API_H3: is like H1 without using optional to obtain columns.<br>

&gt; API_H4: is API_H3 with a different filter construction method.<br>

&gt; API_H5: is API_H3 with another different filter construction method.<br>

&gt; API: is the query created by the API so no additional explanation is<br>

&gt; needed. It is the last one to give it cache advantage over other queries.<br>

&gt; Note: API_H1 results number differs in &quot;music&quot; case because this query<br>

&gt; is not equivalent to the other five.<br>

&gt;<br>

&gt; This is an initial test and with my current db knowledge caution is<br>

&gt; mandatory but here are some initial conclusions:<br>

&gt; 1) As in SQL, subqueries increment performance<br>

&gt; over indiscriminated inner joins in large results sets. I&#39;m not sure if<br>

&gt; this is general to all tripletstore dbms or only to Virtuoso that is a<br>

&gt; rdbms with added tripletstore functionally. Seems like search API must<br>

&gt; build queries use less joins and more subqueries.<br>

&gt; 2) Optional seems haven&#39;t performance impact extracting column values<br>

&gt; but queries are less easy to read and you must write more characters :).<br>

&gt; 3) API_H3, API_H4 and API_H5 has similar times so without a profiling<br>

&gt; tool and more test is difficult to wonder what is the best. Probably<br>

&gt; query optimizer is doing it&#39;s job and, in fact, there is no differences<br>

&gt; at all.<br>

&gt;<br>

&gt; In &quot;sparql_test.spql&quot; attach there are the queries.<br>

&gt;<br>

&gt; I go back to stored procedures because this problem could be solved<br>

&gt; using Virtuoso. You have the need to filter for an value always so you<br>

&gt; construct a relation any time you need to filter for this value. The<br>

&gt; problem with this approach is that any time you do a query you must<br>

&gt; build again this relation and this is time consuming. And the problem is<br>

&gt; more and more serious when data grows.<br>

&gt;<br>

&gt; In my db there are 23.766 results to userVisible = True and doing all<br>

&gt; queries with a join with a table like this is not a good idea and we<br>

&gt; can&#39;t trust that query optimizers do all your job without any<br>

&gt; help. There are some solutions to this problem at db level:<br>

&gt;<br>

&gt; 1) The (probably) ideal one, create a view in the dbms using your query<br>

&gt; and use always this to filter your data. If query is very simple and<br>

&gt; dbms is good this is automaintained and fast because view don&#39;t exists<br>

&gt; and is only a different representation of data in your db.<br>

&gt;<br>

&gt; 2) If the first approach don&#39;t works, view is slow and general db<br>

&gt; performance is degrading, you must create a table to store this data and<br>

&gt; use stored procedures and triggers to maintain this table. A view from<br>

&gt; first case and a physical table is the same at query language level so,<br>

&gt; if we tried the first case, most of the work is done and your queries<br>

&gt; need only minor changes or even none.<br>

&gt;<br>

&gt; 3) The easy one, add a property with and index to do the filter. You<br>

&gt; told that this is your approach and works but you must be cautious<br>

&gt; because now is visibility, later is user restriction, later devices<br>

&gt; restriction and so on, and you can&#39;t solve all your filter problems with<br>

&gt; properties as you can&#39;t solve all your query problems with joins.<br>

&gt; Increase your size register and add so many indexes could have a<br>

&gt; performance penalty when your data grows but note that this differs from<br>

&gt; one dbms to other so there is no general rules.<br>

&gt;<br>

&gt; You can solve this problem in application layer too but, if the problem<br>

&gt; is db related must be solved at db level if if possible. Obviously this<br>

&gt; is a relaxed rule.<br>

&gt;<br>

&gt; Because, sadly, there is no perfect solution you must try<br>

&gt; different approaches and use the solution that better suits to any case<br>

&gt; and, here, profiling tools are your friends. About this, I will try to<br>

&gt; activate Virtuoso web interface to activate profiling but I can&#39;t. I<br>

&gt; download and compile Virtuoso and with my instance web interface is<br>

&gt; available but I can&#39;t imagine how activate it in Nepomuk&#39;s Virtuoso<br>

&gt; instance. I can copying the db and use it in my Virtuoso instance but<br>

&gt; this is uncomfortably if I need doing changes in data using Nepomuk.<br>

&gt;<br>

&gt; I found many errors in soprano-virtuoso db log. I created a db from<br>

&gt; scratch and this errors persists: no method of name existsNode,<br>

&gt; getClobVal, getNumVal, getSchemaURL etc... I&#39;m informing you because I<br>

&gt; don&#39;t know if this errors are relevant or not.<br>

&gt;<br>

&gt; I&#39;m sorry if I&#39;m not much help but I&#39;m doing my best. I never worked<br>

&gt; before with Sparql and tripletstore but I&#39;m learning.<br>

&gt;<br>

&gt;<br>

&gt;     Cheers,<br>

&gt;     Sebastian<br>

&gt;     _______________________________________________<br>

&gt;     Nepomuk mailing list<br>

</div></div>&gt;     <a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a> &lt;mailto:<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a>&gt;<br>

<div><div></div><div>&gt;     <a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/listinfo/nepomuk</a><br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; --<br>

&gt; Cheers,<br>

&gt; Ignacio<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt;<br>

&gt; _______________________________________________<br>

&gt; Nepomuk mailing list<br>

&gt; <a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>

&gt; <a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/listinfo/nepomuk</a><br>

_______________________________________________<br>

Nepomuk mailing list<br>

<a href="mailto:Nepomuk@kde.org" target="_blank">Nepomuk@kde.org</a><br>

<a href="https://mail.kde.org/mailman/listinfo/nepomuk" target="_blank">https://mail.kde.org/mailman/listinfo/nepomuk</a><br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br>Cheers,<div>Ignacio</div><div><br></div><br>