[Nepomuk] Virtuoso Problems - nao:userVisible
Vishesh Handa
me at vhanda.in
Wed Aug 22 05:46:00 UTC 2012
Hey everyone
In 4.9, most the queries on large datasets are impossibly slow and often
cause virtuoso to completely lock up. So I've been going through the common
queries that are passed to Nepomuk from a user perspective and been trying
to optimize them.
The most prevalent problem is that of the user visibility.
Simple queries like listing all the tags seem to blow out of proportion
with the added "FILTER EXISTS { ?r a [ nao:userVisible "true"^^xsd:boolean
] . }". If one looks the the SQL that is being generated one can see a
drastic different
"select ?r where { ?r a nao:Tag . }"
SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_0-t0"."O")
AND "s_1_0-t0"."O" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
OPTION (QUIETCAST)
"select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [ nao:userVisible
"true"^^xsd:boolean ] . } }"
SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
WHERE "s_1_0-t0"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_0-t0"."O")
AND "s_1_0-t0"."O" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
AND EXISTS ( (
SELECT TOP 1 1 AS __ask_retval
FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
ON ( "s_1_4-t1"."S" = "s_1_4-t2"."O" )
WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible' , 1))
AND (1 - isiri_id ( "s_1_4-t1"."O"))
AND "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
AND "s_1_4-t2"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_4-t2"."O")
AND "s_1_4-t2"."S" = "s_1_0-t0"."S"
OPTION (QUIETCAST)
))
OPTION (QUIETCAST)
The second query results in an added query on every single result, and that
additional query also contains an added join.
On my system with 13k tags (yeah, I know), the system is completely
unusable. Virtuoso pops up to 200% and takes about 5 minutes to respond.
While I don't expect anyone to have 13k tags, people do have those many
contacts or emails.
Options on how to fix -
1. Use graphs with a filter -
select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g a
nrl:Ontology. } }
_______________________________________________________________________________
SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_1-t0"."O")
AND "s_1_1-t0"."O" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
AND not ( EXISTS ( (
SELECT TOP 1 1 AS __ask_retval
FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
WHERE "s_1_4-t1"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_4-t1"."O")
AND "s_1_4-t1"."O" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
AND "s_1_4-t1"."S" = "s_1_1-t0"."G"
OPTION (QUIETCAST)
)))
OPTION (QUIETCAST)
This also results in an additional SQL query per resource, but it's still a
LOT faster (no join in the exists query).
2.) Use graphs via nao:maintainedBy
select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app . }'
_______________________________________________________________________________
SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
ON ( "s_1_0-t1"."S" = "s_1_1-t0"."G" )
WHERE "s_1_1-t0"."P" = __i2idn ( __bft( '
http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
AND isiri_id ( "s_1_1-t0"."O")
AND "s_1_1-t0"."O" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
AND ( "s_1_0-t1"."S" < min_bnode_iri_id ())
AND "s_1_0-t1"."P" = __i2idn ( __bft( '
http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' , 1))
OPTION (QUIETCAST)
This would be the ideal solution, however it will kill backward
compatibility cause all the graph don't have the nao:maintainedBy clause.
3.) Go SQL and add another column to our RDF_QUAD table which is indexed.
That way we can always filter statements on the basis of visibility. Would
be considerably faster than the join.
I suggest we go with option 1 for 4.9, and option 2 for 4.10 and get rid of
all the user visible stuff.
Any suggestions?
--
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20120822/26fbeeff/attachment.html>
More information about the Nepomuk
mailing list