[Nepomuk] [RFC] Better Full text search

Vishesh Handa me at vhanda.in
Sat May 4 13:19:05 UTC 2013


Hey guys

I've been thinking about this for a couple of weeks now. We basically do
not do text based searches that well specifically in the case when the data
is separated among multiple resources.

For example - A music file has it's artist and album stored in separate
resources. So doing a search where I mention - "title artist album" is very
hard to do.

select ?r where {
  {
     { ?r ?p ?o .
      bif:contains(?o, "title") .
    }
    UNION {
        ?r ?p ?o1
        ?o1 ?p2 ?o .
        bif:contains(?o, "title") .
   }
  }
  {
     { ?r ?p ?o .
      bif:contains(?o, "artist") .
    }
    UNION {
        ?r ?p ?o1
        ?o1 ?p2 ?o .
        bif:contains(?o, "artist") .
   }
  }
  {
     { ?r ?p ?o .
      bif:contains(?o, "album") .
    }
    UNION {
        ?r ?p ?o1
        ?o1 ?p2 ?o .
        bif:contains(?o, "album") .
   }
  }
}


This query is a monster and takes quite some time to execute. About 26
seconds on my system. Even when you're doing a simple search for one word
it is still something like this -

select distinct ?r where {
    { ?r ?p ?o .
      bif:contains(?o, "word") .
    }
    UNION {
        ?r ?p ?o1
        ?o1 ?p2 ?o .
        bif:contains(?o, "word") .
   }
}

which is again kinda slow cause we aren't using any of the indexes of the
statements.

I was thinking of moving all the plain text related to a file into the
nie:plainTextContent of the resource. So in the case of music we would have
-

<res> nie:plainTextContent "title artist album whatevereElse" .

for the case of files, we would append the file name, and any other plain
text that we want searched just in the nie:plainTextConent. So a search for
any combination of text will just have to search through the plain text
content.

Opinions?

We can easily do this for the 4.11 release cause we already need everyone
to re-index everything cause of the migration.

-- 
Vishesh Handa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20130504/e75582a0/attachment.html>


More information about the Nepomuk mailing list