Nepomuk in 4.13 and beyond

Vishesh Handa me at vhanda.in
Fri Dec 13 11:13:19 GMT 2013


On Thursday 12 Dec 2013 21:23:51 Aaron J. Seigo wrote:
> On Thursday, December 12, 2013 20:10:27 Vishesh Handa wrote:
> > On Thursday 12 Dec 2013 19:40:11 Ivan Čukić wrote:
> > > > If we all decide to store stuff in sqlite, then it doesn't matter if
> > > > they
> > > > are separate database files or the same one.
> > > 
> > > I might be missing a few things here, but asking questions is the road
> > > to
> > > enlightenment :)
> > > 
> > > - There is no way to query across different stores, which was the main
> > > appeal of nepomuk? (I concluded this from the last mail)
> > 
> > There isn't one. Not right now. I'm open to ideas on how to do something
> > like if it is required. I'm slightly skeptical if it actually is required.
> 
> for activities it’s pretty much a requirement: we have an activity and we
> want to know all resources (files, contacts, bookmarks, applications,
> windows ..) associated with it. so for activities we’ll either end up
> querying each store separately or Baloo will need to provide a way to query
> multiple stores.
> 
> for the Plasma Active shell as it currently is, single-store querying might
> be workable as we tend to keep most of the different resources separated in
> the UI (though that’s one thing i want to change in future releases, so you
> can group a set of bookmarks with a given file, e.g.)

I'm slightly confused.

Please correct me if I haven't understood the problem correctly - You have an 
activity and you have a number of different resources related to that activity. 
The resource can be a file/contact/application/bookmark/anything.

In order to store this, you could just store a mapping between activity id and 
resource id. Almost identical to what we have for tags. If this was stored in 
sql. Fetching everything related to a query would be -

select * from activityRelation where activityId = 'identifier';

Then, when displaying each of these resources, you would need to query the 
individual stores they are in. For Contact and Emails, this would be Akonadi. 
For files, there is a FileFetchJob, etc.

> 
> it would be a big problem if the tags are per-store as well; we need cross-
> store tags (though from glancing at the API tonight it looks like that is
> already there?)
> 

Yes. But I'm having second thoughts about this.

> this may be a question of API, of course. with different stores, collation
> will need to happen somewhere. should it happen on the client side or the
> server side is, i suppose, the big question.
> 
> i would suggest server side for a simple reason: if multiple stores all
> share the same physical storage system, it would be really nice to be able
> to optimize queries to hit that storage system as little as possible.
> example:
> 
> Stores: S0, S1, S2
> 
> S0 -> xapian
> S1 -> xapian
> S2 -> mysql
> 
> when fetching items from S0 and S1 that match tag T0, it would be very nice
> if the backends could cooperate to merge their queries into one so that one
> xapian query is done rather than 2 with post-query collation of the
> results.
> 
> for obvious reasons this can only be done in the server where the stores can
> cooperate.
> 
> a concrete use case:
> 
> S0 = files
> S1 = bookmarks
> S2 = applications
> 
> application = Plasma Active shell
> 
> if adding stores is easy enough, i expect we’ll end up with stores for
> things like geolocation, so this could balloon further.
> 
> > > - When querying, how do I get the properties of the results?
> > 
> > You don't. You just get the identifier and some text. You can do a
> > subsequent fetch job to get additional data.
> 
> more roundtrips doesn’t sound great for performance. if a result set has a
> 1000 returned items and you then want to get properties on them (e.g. for
> listing and sorting) then one needs to either send all 1000 UIDs back for
> further processing or in a worst case scenario 1000 individual requests.
> 
> this will be an issue for several things in Plasma Active, such as the file
> manager. unlike Dolphin which just shows metadata for a given file, the
> Active Files app relies on Nepomuk rather than the filesystem for these
> things and allows filtering by ratings, tags, etc.
> 
> > > - We talked about asynchronous querying. Is it going to happen?
> > 
> > There is a QueryRunnable class which can be used to run queries in another
> > thread. Most backends, do not seem to allow asynchronous queries, so there
> > wasn't a way to run queries asynchronously by default.
> 
> those backends could be run in a thread? iow, put the async/threading as a
> first class feature that the backends must implement. even if it means
> having a thread for execution in the background and queueing requests.
> 
> making every user handle the threading sounds like we’ll have lots of code
> that doesn’t ;)
>

Perhaps. There is always a tradeofff between keeping backend implementations 
simple, and having complex library code.

> > > From my POV, it would be much nicer if you forced a single db (as an
> > > actual
> > > store, not as a cache like nepomuk is for akonadi) on the people, with
> > > the
> > > option to have a few things runtime defined. It would ease the
> > > development
> > > and would allow more fun queries which would be optimized unlike the
> > > manual
> > > client-side joining of different query results.
> > 
> > But what if one doesn't use SQL for storing data? IMO Xapian is much
> > better
> > suited that sqlite's FTS support (or mysql).
> 
> hopefully there would be a query object and people would not be hand coding
> queries in strings that is passed to be parsed. which would make the “what
> is the query language” thing moot; the sparql queries in C++ is one thing i
> never really got comfortable with with nepomuk.
> 

+1

> > When planning Baloo, I've mostly taken a look at PIM, Dolphin, KRunner
> > (and
> > Milou), PMC, and KPeople. Perhaps something was missed?
> 
> usage in activities and Plasma Active are key use cases from my POV.

If you want, we can discuss this over a hangout or irc where there is a 
smaller delay between responses. 

What time would be suitable for everyone?
 
-- 
Vishesh Handa




More information about the kde-core-devel mailing list