[Nepomuk] [RFC] Better Full text search

Christian Mollekopf chrigi_1 at fastmail.fm
Sat May 4 13:40:44 UTC 2013


On Saturday 04 May 2013 18.49:05 Vishesh Handa wrote:
> Hey guys
> 

> I was thinking of moving all the plain text related to a file into the
> nie:plainTextContent of the resource. So in the case of music we would have
> -
> 
> <res> nie:plainTextContent "title artist album whatevereElse" .
> 
> for the case of files, we would append the file name, and any other plain
> text that we want searched just in the nie:plainTextConent. So a search for
> any combination of text will just have to search through the plain text
> content.
> 
> Opinions?

Hey Vishesh,

I think that's a good idea. We're also already using it that way to be able to 
search through emails with markup in the email feeder, and I see no reason why 
we can't extend that to other resource types (after all the property is 
exactly for this purpose).
So that means, in the future all feeders should push all information which 
should be matched by full text searching to nie:plainTextContent, right?

The alternative would of course be to use a separate dedicated fulltext index, 
which may have better performance, some more features (tokenizer, stemming 
etc.), but would obviously complicate the setup again (fulltext query => i.e. 
filter by type in nepomuk => retrieve akonadi item). So not necessarily the way 
to go, but I wanted to bring it on the table anyways as it's IMO not 
conflicting with what nepomuk provides (the semantic analysis), and could 
result in better results (performance and feature wise) than letting virtuoso 
doing all the work.

> 
> We can easily do this for the 4.11 release cause we already need everyone
> to re-index everything cause of the migration.

Cool.

Cheers,
Christian


More information about the Nepomuk mailing list