[Nepomuk] [RFC] Better Full text search

Sebastian Trüg trueg at kde.org
Thu May 9 12:27:55 UTC 2013


Virtuoso actually has support for indexing plug-ins. I did not look into 
this in depth yet but it would even be possible to plug in an external 
index like clucene and use it from within the query engine.

This, however, is a bit more involved. As a first step it might be 
worthwhile to look into the improvements that could be achieved by 
simply customizing the full text index in a way that
1. allows to query all relevant fields in one statement
2. still keep context information

I will, however, have to research this further to know the extend of 
work required to get to that point. Might be rather simple, might be harder.

Cheers,
Sebastian

On 05/04/2013 03:40 PM, Christian Mollekopf wrote:
> On Saturday 04 May 2013 18.49:05 Vishesh Handa wrote:
>> Hey guys
>>
>
>> I was thinking of moving all the plain text related to a file into the
>> nie:plainTextContent of the resource. So in the case of music we would have
>> -
>>
>> <res> nie:plainTextContent "title artist album whatevereElse" .
>>
>> for the case of files, we would append the file name, and any other plain
>> text that we want searched just in the nie:plainTextConent. So a search for
>> any combination of text will just have to search through the plain text
>> content.
>>
>> Opinions?
>
> Hey Vishesh,
>
> I think that's a good idea. We're also already using it that way to be able to
> search through emails with markup in the email feeder, and I see no reason why
> we can't extend that to other resource types (after all the property is
> exactly for this purpose).
> So that means, in the future all feeders should push all information which
> should be matched by full text searching to nie:plainTextContent, right?
>
> The alternative would of course be to use a separate dedicated fulltext index,
> which may have better performance, some more features (tokenizer, stemming
> etc.), but would obviously complicate the setup again (fulltext query => i.e.
> filter by type in nepomuk => retrieve akonadi item). So not necessarily the way
> to go, but I wanted to bring it on the table anyways as it's IMO not
> conflicting with what nepomuk provides (the semantic analysis), and could
> result in better results (performance and feature wise) than letting virtuoso
> doing all the work.
>
>>
>> We can easily do this for the 4.11 release cause we already need everyone
>> to re-index everything cause of the migration.
>
> Cool.
>
> Cheers,
> Christian
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>


More information about the Nepomuk mailing list