[Nepomuk] Re: Presentation, queries and other stuff

Mon Mar 7 14:30:18 CET 2011

You get soprano at soprano.sf.net.
As for compiling: check the INSTALL file that comes with it.
Maybe before trying to compile you should install soprano-devel from
your distribution to make sure you get all the dependencies for building it.

Cheers,
Sebastian

On 03/07/2011 12:42 PM, Ignacio Serantes wrote:
> 
> On Mon, Mar 7, 2011 at 10:09 AM, Sebastian Trüg <trueg at kde.org
> <mailto:trueg at kde.org>> wrote:
> 
>     I forgot one thing:
> 
>     about the web interface. Nepomuk is running Virtuoso in lite mode. To
>     disable that you need to hack Soprano and comment line 328 in
>     soprano/backends/virtuoso/virtuosocontroller.cpp.
>     Sadly there is no configuration option for that at the moment.
> 
> 
> All I know about compiling in linux is:
> 1) Download and uncompress tgz.
> 2) ./configure && make && sudo make install
> 
> Could Soprano be compiled with my skills? Where I can download the tgz?
> 
>     Cheers,
>     Sebastian
> 
>     On 03/06/2011 02:33 PM, Ignacio Serantes wrote:
>     > On Wed, Mar 2, 2011 at 12:24 PM, Sebastian Trüg <trueg at kde.org
>     <mailto:trueg at kde.org>
>     > <mailto:trueg at kde.org <mailto:trueg at kde.org>>> wrote:
>     >
>     > Hi!
>     >
>     > Sorry for the lag but I'm really busy.
>     >
>     >     On 03/02/2011 11:57 AM, Ignacio Serantes wrote:
>     >     > On Wed, Mar 2, 2011 at 9:55 AM, Sebastian Trüg
>     <trueg at kde.org <mailto:trueg at kde.org>
>     >     <mailto:trueg at kde.org <mailto:trueg at kde.org>>
>     >     > <mailto:trueg at kde.org <mailto:trueg at kde.org>
>     <mailto:trueg at kde.org <mailto:trueg at kde.org>>>> wrote:
>     >
>     >
>     >
>     >     > This is worrying because many problems can't be solved with
>     brutal
>     >     > queries with brutal inner joins. This stuff must be added if
>     you want
>     >     > that nepomuk db works with a limited amount of RAM and CPU.
>     >
>     >     Please make suggestions.
>     >
>     >
>     >     > The story of my life, I'm always be an step before a working
>     >     nepomuk :).
>     >     > Can you explain me a little bit how are this final
>     optimizations?
>     >
>     >     A better visibility check directly on the resource. This also
>     requires a
>     >     nao:userVisible property on all resources which is done by a
>     new class.
>     >
>     >
>     > Great!, precisely this is the big problem I found with your
>     queries and
>     > you resolved adding a flag. Please, check the following simple queries
>     > with one string tests. Strings are selected to obtain result sets with
>     > different size.
>     >
>     > Test #01 [    API_H1,       (dorama)]:    93 results in   0.06073117
>     > seconds
>     > Test #02 [    API_H2,       (dorama)]:    93 results in   1.84762883
>     > seconds
>     > Test #03 [    API_H3,       (dorama)]:    93 results in  
>     2.01492691 seconds
>     > Test #04 [    API_H4,       (dorama)]:    93 results in  
>     1.91813993 seconds
>     > Test #05 [    API_H5,       (dorama)]:    93 results in  
>     1.60597801 seconds
>     > Test #06 [       API,       (dorama)]:    93 results in  
>     2.23234606 seconds
>     > Test #07 [    API_H1,    (ha ji won)]:     2 results in   0.01688004
>     > seconds
>     > Test #08 [    API_H2,    (ha ji won)]:     2 results in  
>     5.00310612 seconds
>     > Test #09 [    API_H3,    (ha ji won)]:     2 results in  
>     4.11244702 seconds
>     > Test #10 [    API_H4,    (ha ji won)]:     2 results in  
>     3.95332003 seconds
>     > Test #11 [    API_H5,    (ha ji won)]:     2 results in  
>     3.90592694 seconds
>     > Test #12 [       API,    (ha ji won)]:     2 results in
>      11.18835807 seconds
>     > Test #13 [    API_H1,        (music)]:  5434 results in  
>     1.39107108 seconds
>     > Test #14 [    API_H2,        (music)]:  5420 results in  
>     4.74685478 seconds
>     > Test #15 [    API_H3,        (music)]:  5420 results in  
>     5.27211499 seconds
>     > Test #16 [    API_H4,        (music)]:  5420 results in  
>     5.14297509 seconds
>     > Test #17 [    API_H5,        (music)]:  5420 results in  
>     4.99993420 seconds
>     > Test #18 [       API,        (music)]:  5420 results in  
>     9.88414216 seconds
>     >
>     > The query is the same but I applied minor changes:
>     > API_H1: is the same query without visibility inner join so easy to
>     > wonder where is the main performance problem.
>     > API_H2: is the query using a subquery and not an inner join. Query is
>     > equivalent (A U B) X C = (A X C) U (B X C). Because subquery is
>     the same
>     > could be optimized by query optimizer and, because intermediate joins
>     > result sets are small union is fast. This is easy to see with a simple
>     > SELECT DISTINCT * in both queries and comparing result sets.
>     > API_H3: is like H1 without using optional to obtain columns.
>     > API_H4: is API_H3 with a different filter construction method.
>     > API_H5: is API_H3 with another different filter construction method.
>     > API: is the query created by the API so no additional explanation is
>     > needed. It is the last one to give it cache advantage over other
>     queries.
>     > Note: API_H1 results number differs in "music" case because this query
>     > is not equivalent to the other five.
>     >
>     > This is an initial test and with my current db knowledge caution is
>     > mandatory but here are some initial conclusions:
>     > 1) As in SQL, subqueries increment performance
>     > over indiscriminated inner joins in large results sets. I'm not
>     sure if
>     > this is general to all tripletstore dbms or only to Virtuoso that is a
>     > rdbms with added tripletstore functionally. Seems like search API must
>     > build queries use less joins and more subqueries.
>     > 2) Optional seems haven't performance impact extracting column values
>     > but queries are less easy to read and you must write more
>     characters :).
>     > 3) API_H3, API_H4 and API_H5 has similar times so without a profiling
>     > tool and more test is difficult to wonder what is the best. Probably
>     > query optimizer is doing it's job and, in fact, there is no
>     differences
>     > at all.
>     >
>     > In "sparql_test.spql" attach there are the queries.
>     >
>     > I go back to stored procedures because this problem could be solved
>     > using Virtuoso. You have the need to filter for an value always so you
>     > construct a relation any time you need to filter for this value. The
>     > problem with this approach is that any time you do a query you must
>     > build again this relation and this is time consuming. And the
>     problem is
>     > more and more serious when data grows.
>     >
>     > In my db there are 23.766 results to userVisible = True and doing all
>     > queries with a join with a table like this is not a good idea and we
>     > can't trust that query optimizers do all your job without any
>     > help. There are some solutions to this problem at db level:
>     >
>     > 1) The (probably) ideal one, create a view in the dbms using your
>     query
>     > and use always this to filter your data. If query is very simple and
>     > dbms is good this is automaintained and fast because view don't exists
>     > and is only a different representation of data in your db.
>     >
>     > 2) If the first approach don't works, view is slow and general db
>     > performance is degrading, you must create a table to store this
>     data and
>     > use stored procedures and triggers to maintain this table. A view from
>     > first case and a physical table is the same at query language
>     level so,
>     > if we tried the first case, most of the work is done and your queries
>     > need only minor changes or even none.
>     >
>     > 3) The easy one, add a property with and index to do the filter. You
>     > told that this is your approach and works but you must be cautious
>     > because now is visibility, later is user restriction, later devices
>     > restriction and so on, and you can't solve all your filter
>     problems with
>     > properties as you can't solve all your query problems with joins.
>     > Increase your size register and add so many indexes could have a
>     > performance penalty when your data grows but note that this
>     differs from
>     > one dbms to other so there is no general rules.
>     >
>     > You can solve this problem in application layer too but, if the
>     problem
>     > is db related must be solved at db level if if possible. Obviously
>     this
>     > is a relaxed rule.
>     >
>     > Because, sadly, there is no perfect solution you must try
>     > different approaches and use the solution that better suits to any
>     case
>     > and, here, profiling tools are your friends. About this, I will try to
>     > activate Virtuoso web interface to activate profiling but I can't. I
>     > download and compile Virtuoso and with my instance web interface is
>     > available but I can't imagine how activate it in Nepomuk's Virtuoso
>     > instance. I can copying the db and use it in my Virtuoso instance but
>     > this is uncomfortably if I need doing changes in data using Nepomuk.
>     >
>     > I found many errors in soprano-virtuoso db log. I created a db from
>     > scratch and this errors persists: no method of name existsNode,
>     > getClobVal, getNumVal, getSchemaURL etc... I'm informing you because I
>     > don't know if this errors are relevant or not.
>     >
>     > I'm sorry if I'm not much help but I'm doing my best. I never worked
>     > before with Sparql and tripletstore but I'm learning.
>     >
>     >
>     >     Cheers,
>     >     Sebastian
>     >     _______________________________________________
>     >     Nepomuk mailing list
>     >     Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     <mailto:Nepomuk at kde.org <mailto:Nepomuk at kde.org>>
>     >     https://mail.kde.org/mailman/listinfo/nepomuk
>     >
>     >
>     >
>     >
>     > --
>     > Cheers,
>     > Ignacio
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Nepomuk mailing list
>     > Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     > https://mail.kde.org/mailman/listinfo/nepomuk
>     _______________________________________________
>     Nepomuk mailing list
>     Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     https://mail.kde.org/mailman/listinfo/nepomuk
> 
> 
> 
> 
> -- 
> Cheers,
> Ignacio
> 
> 
> 
> 
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk