[Nepomuk] [RFC] Better Full text search

phreedom at yandex.ru phreedom at yandex.ru
Sat May 4 15:12:14 UTC 2013


On Суббота 04 мая 2013 16:44:40 Christian Mollekopf wrote:
> On Saturday 04 May 2013 16.38:54 Ivan Čukić wrote:
> > > It's actually a property specifically for this purpose:
> > > "Plain-text representation of the content of a InformationElement with
> > > all
> > > markup removed. The main purpose of this property is full-text indexing
> > > and
> > 
> > And title/album/author is the *content* of a song? Not from my POV, but I
> > guess it is open to different interpretations.
> 
> Yes, it's a somewhat lax interpretation ;-) IMO what the paragraph tries to
> tell us is that this is a property is here to support fulltext searching,
> and that it should never be used to store data which is user visible (or
> even editable).

As the person who wrote this description, I can confirm that the intent was to 
support "flat" plaintext search(eg via lucene), thus it makes sense to put all 
relevant strings into it. Even if Vishesh's query didn't take this long to 
execute, you have to realize that the query doesn't stop at the boundary the 
user expects it to, it simply traverses the data graph to a fixed depth, which 
might be too shallow or too deep, so in practice we might end up using this 
plainTextContent hack anyway :(

> > > search. Its exact content is considered application-specific. The user
> > > can
> > > make no assumptions about what is and what is not contained within.
> > 
> > Ok, this clause does provide a 'this can be used for hacks' statement.
> 
> Indeed.

Moreover, this was intended to discourage reading the property as such, thus 
enabling implementations which don't store the property content at all, only 
its indexed representation. Thus you can avoid any extra storage overhead, 
search only the relevant strings, but at a cost of having to figure out what 
triggered the match , as opposed to snippets being provided by the backend.


More information about the Nepomuk mailing list