[Nepomuk] [RFC] Better Full text search

phreedom at yandex.ru phreedom at yandex.ru
Sat May 4 14:09:59 UTC 2013


On Суббота 04 мая 2013 18:49:05 Vishesh Handa wrote:
> Even when you're doing a simple search for one word
> it is still something like this -
> 
> select distinct ?r where {
>     { ?r ?p ?o .
>       bif:contains(?o, "word") .
>     }
>     UNION {
>         ?r ?p ?o1
>         ?o1 ?p2 ?o .
>         bif:contains(?o, "word") .
>    }
> }
> 
> which is again kinda slow cause we aren't using any of the indexes of the
> statements.
> 
> I was thinking of moving all the plain text related to a file into the
> nie:plainTextContent of the resource. So in the case of music we would have
> -
> 
> <res> nie:plainTextContent "title artist album whatevereElse" .
> 
> for the case of files, we would append the file name, and any other plain
> text that we want searched just in the nie:plainTextConent. So a search for
> any combination of text will just have to search through the plain text
> content.
> 
> Opinions?
> 
> We can easily do this for the 4.11 release cause we already need everyone
> to re-index everything cause of the migration.

Have you asked Virtuoso devs?
Unless someone tried using plainTextContent as primary storage, it shouldn't 
be a problem. But if structured data goes out of the window, we could as well 
use clucene to get good performance :(


More information about the Nepomuk mailing list