[Nepomuk] [RFC] Better Full text search

Sat May 4 15:48:02 UTC 2013

On Суббота 04 мая 2013 20:14:37 Vishesh Handa wrote:
> On Sat, May 4, 2013 at 7:47 PM, Ivan Čukić <ivan.cukic at kde.org> wrote:
> > > <res> nie:plainTextContent "title artist album whatevereElse" .
> > 
> > For me, the plainTextContent of a song would be the lyrics. This seems
> > like a
> > misuse of the property. With a very good reason behind it, but still a
> > misuse.
> > 
> > I remember when I wanted to keep all activities in one string property as
> > a \n
> > terminated list to make it speedy :D
> > 
> > I'd say go for it, but only as a last resort.
> 
> I would not like Nepomuk to be a data store. It's not the place to store
> your lyrics to fetch them later, same for emails and files. It is a place
> to store structured data.
> 
> In the case of lyrics, the main reason we are storing them is to be able to
> be search through them, not to display them to the user. So we can
> potentially append other data.

Yes and no.  Until discardable graphs were introduced, there was even no 
distinction between primary storage and cached stuff. The real life is even 
more complicated, you can have local data indexed, you can have  remote data 
indexed(and it would be very very nice to have it cached) and for some tuff 
nepomuk is used as the primary storage.

The reason people are trying to stuff nepomuk with their blobs is very simple: 
there's a very real demand for this functionality and nepomuk ontologies as-is 
already allow you to store your whole filesystem, including all byte 
streams/file contents, so it looks like a very reasonable approach, especially 
since nobody actually offers an alternative. Ok akonadi is the only exception 
which provides caching of remote data but it's domain-specific.

Imagine a user finding a music video by its lyrics, opening the video only to 
discover that (s)he can't see any lyrics, because nepomuk got its lyrics from 
some web extractor. Thus the motivation to use nepomuk at least as a cache of 
data, not only for search purposes.

There's no primary storage for user-generated rdf at all, so the data is 
stored in nepomuk and users are disappointed when something breaks or 
disappears.

I'm currently experimenting with solutions to some of these issues, but I 
can't do it fast due to time constraints. I don't expect anything worth going 
public with in the next couple of months at least and that's if I'm lucky :(