[Nepomuk] Virtuoso Problems - nao:userVisible

Vishesh Handa me at vhanda.in
Tue Oct 2 04:10:46 UTC 2012


Ping?

4.9.2 is out. And we definitely need to fix this in time for 4.10.

I can experiment with the extra column approach, though it seems pretty
unclean to me. (Unclean = not rdf based)

On Wed, Aug 22, 2012 at 11:16 AM, Vishesh Handa <me at vhanda.in> wrote:

> Hey everyone
>
> In 4.9, most the queries on large datasets are impossibly slow and often
> cause virtuoso to completely lock up. So I've been going through the common
> queries that are passed to Nepomuk from a user perspective and been trying
> to optimize them.
>
> The most prevalent problem is that of the user visibility.
>
> Simple queries like listing all the tags seem to blow out of proportion
> with the added "FILTER EXISTS { ?r a [ nao:userVisible "true"^^xsd:boolean
> ] . }". If one looks the the SQL that is being generated one can see a
> drastic different
>
> "select ?r where { ?r a nao:Tag . }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> OPTION (QUIETCAST)
>
>
> "select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [ nao:userVisible
> "true"^^xsd:boolean ] . } }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_0-t0"."O")
>   AND  "s_1_0-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>         INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
>         ON ( "s_1_4-t1"."S"  = "s_1_4-t2"."O" )
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible' ,
> 1))
>         AND  (1 - isiri_id ( "s_1_4-t1"."O"))
>         AND  "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
>         AND  "s_1_4-t2"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t2"."O")
>         AND  "s_1_4-t2"."S"  = "s_1_0-t0"."S"
> OPTION (QUIETCAST)
>      ))
> OPTION (QUIETCAST)
>
> The second query results in an added query on every single result, and
> that additional query also contains an added join.
>
> On my system with 13k tags (yeah, I know), the system is completely
> unusable. Virtuoso pops up to 200% and takes about 5 minutes to respond.
> While I don't expect anyone to have 13k tags, people do have those many
> contacts or emails.
>
> Options on how to fix -
>
> 1. Use graphs with a filter -
>
> select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a
> nrl:Ontology. } }
>
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  not ( EXISTS ( (
>      SELECT TOP 1 1 AS __ask_retval
>       FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
>       WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>         AND  isiri_id ( "s_1_4-t1"."O")
>         AND  "s_1_4-t1"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
>         AND  "s_1_4-t1"."S"  = "s_1_1-t0"."G"
> OPTION (QUIETCAST)
>      )))
> OPTION (QUIETCAST)
>
> This also results in an additional SQL query per resource, but it's still
> a LOT faster (no join in the exists query).
>
> 2.) Use graphs via nao:maintainedBy
>
> select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app . }'
>
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
>   INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
>   ON ( "s_1_0-t1"."S"  = "s_1_1-t0"."G" )
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
>   AND  isiri_id ( "s_1_1-t0"."O")
>   AND  "s_1_1-t0"."O" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
>   AND  ( "s_1_0-t1"."S" < min_bnode_iri_id ())
>   AND  "s_1_0-t1"."P" = __i2idn ( __bft( '
> http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' ,
> 1))
> OPTION (QUIETCAST)
>
> This would be the ideal solution, however it will kill backward
> compatibility cause all the graph don't have the nao:maintainedBy clause.
>
> 3.) Go SQL and add another column to our RDF_QUAD table which is indexed.
> That way we can always filter statements on the basis of visibility. Would
> be considerably faster than the join.
>
> I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get rid
> of all the user visible stuff.
>
> Any suggestions?
>
> --
> Vishesh Handa
>
>


-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20121002/84ab93ba/attachment.html>


More information about the Nepomuk mailing list