[Nepomuk] Virtuoso Problems - nao:userVisible
Gaël Beaudoin
gaboo at gaboo.org
Wed Aug 22 08:22:36 UTC 2012
Le 22/08/2012 07:46, Vishesh Handa a écrit :
> Hey everyone
>
> In 4.9, most the queries on large datasets are impossibly slow and
> often cause virtuoso to completely lock up. So I've been going through
> the common queries that are passed to Nepomuk from a user perspective
> and been trying to optimize them.
>
> The most prevalent problem is that of the user visibility.
>
> Simple queries like listing all the tags seem to blow out of
> proportion with the added " FILTER EXISTS { ?r a [ nao:userVisible
> "true"^^xsd:boolean ] . }". If one looks the the SQL that is being
> generated one can see a drastic different
>
> "select ?r where { ?r a nao:Tag . }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_0-t0"."O")
> AND "s_1_0-t0"."O" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> OPTION (QUIETCAST)
>
>
> "select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [
> nao:userVisible "true"^^xsd:boolean ] . } }"
>
> SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
> WHERE "s_1_0-t0"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_0-t0"."O")
> AND "s_1_0-t0"."O" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> AND EXISTS ( (
> SELECT TOP 1 1 AS __ask_retval
> FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
> INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
> ON ( "s_1_4-t1"."S" = "s_1_4-t2"."O" )
> WHERE "s_1_4-t1"."P" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible'
> , 1))
> AND (1 - isiri_id ( "s_1_4-t1"."O"))
> AND "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
> AND "s_1_4-t2"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_4-t2"."O")
> AND "s_1_4-t2"."S" = "s_1_0-t0"."S"
> OPTION (QUIETCAST)
> ))
> OPTION (QUIETCAST)
>
> The second query results in an added query on every single result, and
> that additional query also contains an added join.
>
> On my system with 13k tags (yeah, I know), the system is completely
> unusable. Virtuoso pops up to 200% and takes about 5 minutes to
> respond. While I don't expect anyone to have 13k tags, people do have
> those many contacts or emails.
>
> Options on how to fix -
>
> 1. Use graphs with a filter -
>
> select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a
> nrl:Ontology. } }
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_1-t0"."O")
> AND "s_1_1-t0"."O" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> AND not ( EXISTS ( (
> SELECT TOP 1 1 AS __ask_retval
> FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
> WHERE "s_1_4-t1"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_4-t1"."O")
> AND "s_1_4-t1"."O" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
> AND "s_1_4-t1"."S" = "s_1_1-t0"."G"
> OPTION (QUIETCAST)
> )))
> OPTION (QUIETCAST)
>
> This also results in an additional SQL query per resource, but it's
> still a LOT faster (no join in the exists query).
>
> 2.) Use graphs via nao:maintainedBy
>
> select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app
> . }'
> _______________________________________________________________________________
>
> SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
> FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
> INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
> ON ( "s_1_0-t1"."S" = "s_1_1-t0"."G" )
> WHERE "s_1_1-t0"."P" = __i2idn ( __bft(
> 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
> AND isiri_id ( "s_1_1-t0"."O")
> AND "s_1_1-t0"."O" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
> AND ( "s_1_0-t1"."S" < min_bnode_iri_id ())
> AND "s_1_0-t1"."P" = __i2idn ( __bft(
> 'http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' ,
> 1))
> OPTION (QUIETCAST)
>
> This would be the ideal solution, however it will kill backward
> compatibility cause all the graph don't have the nao:maintainedBy clause.
>
> 3.) Go SQL and add another column to our RDF_QUAD table which is
> indexed. That way we can always filter statements on the basis of
> visibility. Would be considerably faster than the join.
>
> I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get
> rid of all the user visible stuff.
>
> Any suggestions?
>
> --
> Vishesh Handa
>
>
>
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
Why not try 3 ? Looks like a simple and obvious solution from my point
of view. I'm not a sparql guy, but used to dealing with sql and
databases. I understand it's not the more elegant solution, but it will
scale much more and fast is never fast enough.
My 2 cents :)
Gaël
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120822/a8146ddf/attachment.html>
More information about the Nepomuk
mailing list