[Nepomuk] Virtuoso Problems - nao:userVisible

Gaël Beaudoin gaboo at gaboo.org
Wed Aug 22 08:22:36 UTC 2012


Le 22/08/2012 07:46, Vishesh Handa a écrit :
> Hey everyone
>
> In 4.9, most the queries on large datasets are impossibly slow and 
> often cause virtuoso to completely lock up. So I've been going through 
> the common queries that are passed to Nepomuk from a user perspective 
> and been trying to optimize them.
>
> The most prevalent problem is that of the user visibility.
>
> Simple queries like listing all the tags seem to blow out of 
> proportion with the added " FILTER EXISTS { ?r a [ nao:userVisible 
> "true"^^xsd:boolean ] . }". If one looks the the SQL that is being 
> generated one can see a drastic different
>
> "select ?r where { ?r a nao:Tag . }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> OPTION (QUIETCAST)
>
>
> "select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [ 
> nao:userVisible "true"^^xsd:boolean ] . } }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>         INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
>         ON ( "s_1_4-t1"."S"  = "s_1_4-t2"."O" )
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible' 
> , 1))
>         AND  (1 - isiri_id ( "s_1_4-t1"."O"))
>         AND  "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
>         AND  "s_1_4-t2"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t2"."O")
>         AND  "s_1_4-t2"."S"  = "s_1_0-t0"."S"
> OPTION (QUIETCAST)
>      ))
> OPTION (QUIETCAST)
>
> The second query results in an added query on every single result, and 
> that additional query also contains an added join.
>
> On my system with 13k tags (yeah, I know), the system is completely 
> unusable. Virtuoso pops up to 200% and takes about 5 minutes to 
> respond. While I don't expect anyone to have 13k tags, people do have 
> those many contacts or emails.
>
> Options on how to fix -
>
> 1. Use graphs with a filter -
>
> select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a 
> nrl:Ontology. } }
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  not ( EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t1"."O")
>         AND  "s_1_4-t1"."O" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
>         AND  "s_1_4-t1"."S"  = "s_1_1-t0"."G"
> OPTION (QUIETCAST)
>      )))
> OPTION (QUIETCAST)
>
> This also results in an additional SQL query per resource, but it's 
> still a LOT faster (no join in the exists query).
>
> 2.) Use graphs via nao:maintainedBy
>
> select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app 
> . }'
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
>   INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
>   ON ( "s_1_0-t1"."S"  = "s_1_1-t0"."G" )
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( 
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  ( "s_1_0-t1"."S" < min_bnode_iri_id ())
>   AND  "s_1_0-t1"."P" = __i2idn ( __bft( 
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' , 
> 1))
> OPTION (QUIETCAST)
>
> This would be the ideal solution, however it will kill backward 
> compatibility cause all the graph don't have the nao:maintainedBy clause.
>
> 3.) Go SQL and add another column to our RDF_QUAD table which is 
> indexed. That way we can always filter statements on the basis of 
> visibility. Would be considerably faster than the join.
>
> I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get 
> rid of all the user visible stuff.
>
> Any suggestions?
>
> -- 
> Vishesh Handa
>
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk

Why not try 3 ? Looks like a simple and obvious solution from my point 
of view. I'm not a sparql guy, but used to dealing with sql and 
databases. I understand it's not the more elegant solution, but it will 
scale much more and fast is never fast enough.

My 2 cents :)
Gaël
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120822/a8146ddf/attachment.html>


More information about the Nepomuk mailing list