[Nepomuk] Virtuoso Problems - nao:userVisible

Ignacio Serantes kde at aynoa.net
Mon Aug 27 12:18:23 UTC 2012


Hi,

I'm really busy this days but this is my quick post.

On a quick test

SELECT DISTINCT *

WHERE {

  ?x0 a [ nao:userVisible "true"^^xsd:boolean ] .

}


returns 44330 records so an SQL EXIST() clause it's not the best
alternative. If there was a VIEW defined in the DBMS performance could be
better but filtering records with this kind of SQL query it's really far to
be optimal.


The best solution is the index one because the other alternatives are
ignoring nao:userVisible so it's not useful maintain this information if
you are not using it. But as you explain this is a big change so it's
better to wait.


Option 1 is the chosen one because you can't kill backwards compatibility
but consider to develop an automatic translation program for the database
because option 2 query looks like very good because there is no subquery.


I will try to do some test with Nepoogle when I have some free time.


By the way, good analysis.

On Wed, Aug 22, 2012 at 7:46 AM, Vishesh Handa <me at vhanda.in> wrote:

> Hey everyone
>
> In 4.9, most the queries on large datasets are impossibly slow and often
> cause virtuoso to completely lock up. So I've been going through the common
> queries that are passed to Nepomuk from a user perspective and been trying
> to optimize them.
>
> The most prevalent problem is that of the user visibility.
>
> Simple queries like listing all the tags seem to blow out of proportion
> with the added "FILTER EXISTS { ?r a [ nao:userVisible "true"^^xsd:boolean
> ] . }". If one looks the the SQL that is being generated one can see a
> drastic different
>
> "select ?r where { ?r a nao:Tag . }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> OPTION (QUIETCAST)
>
>
> "select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [ nao:userVisible
> "true"^^xsd:boolean ] . } }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>         INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
>         ON ( "s_1_4-t1"."S"  = "s_1_4-t2"."O" )
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible' ,
> 1))
>         AND  (1 - isiri_id ( "s_1_4-t1"."O"))
>         AND  "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
>         AND  "s_1_4-t2"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t2"."O")
>         AND  "s_1_4-t2"."S"  = "s_1_0-t0"."S"
> OPTION (QUIETCAST)
>      ))
> OPTION (QUIETCAST)
>
> The second query results in an added query on every single result, and
> that additional query also contains an added join.
>
> On my system with 13k tags (yeah, I know), the system is completely
> unusable. Virtuoso pops up to 200% and takes about 5 minutes to respond.
> While I don't expect anyone to have 13k tags, people do have those many
> contacts or emails.
>
> Options on how to fix -
>
> 1. Use graphs with a filter -
>
> select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a
> nrl:Ontology. } }
>
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  not ( EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t1"."O")
>         AND  "s_1_4-t1"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
>         AND  "s_1_4-t1"."S"  = "s_1_1-t0"."G"
> OPTION (QUIETCAST)
>      )))
> OPTION (QUIETCAST)
>
> This also results in an additional SQL query per resource, but it's still
> a LOT faster (no join in the exists query).
>
> 2.) Use graphs via nao:maintainedBy
>
> select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app . }'
>
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
>   INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
>   ON ( "s_1_0-t1"."S"  = "s_1_1-t0"."G" )
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  ( "s_1_0-t1"."S" < min_bnode_iri_id ())
>   AND  "s_1_0-t1"."P" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' ,
> 1))
> OPTION (QUIETCAST)
>
> This would be the ideal solution, however it will kill backward
> compatibility cause all the graph don't have the nao:maintainedBy clause.
>
> 3.) Go SQL and add another column to our RDF_QUAD table which is indexed.
> That way we can always filter statements on the basis of visibility. Would
> be considerably faster than the join.
>
> I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get rid
> of all the user visible stuff.
>
> Any suggestions?
>
> --
> Vishesh Handa
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>
>


-- 
Best wishes,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120827/82d01597/attachment.html>


More information about the Nepomuk mailing list