[Kde-pim] Akonadi Nepomuk Email Searching

Volker Krause vkrause at kde.org
Sun Dec 16 12:42:05 GMT 2012


Hi Vishesh,

sorry for the delay.

On Monday 10 December 2012 22:36:52 Vishesh Handa wrote:
> Hey Volker
> 
> On Sun, Dec 9, 2012 at 4:28 PM, Volker Krause <vkrause at kde.org> wrote:
> > Hi Vishesh,
> > 
> > sorry, missed your question about the change notifications in persistent
> > search collections on IRC last night.
> > 
> > Currently the Akonadi server doesn't know if someone is monitoring a
> > specific
> > collection, the filtering happens purely on the client side.
> > 
> > That's obviously not optimal, even without considering Nepomuk. Therefore
> > Michael Jansen added the necessary interfaces for server-side filtering of
> > change notifications some time ago. We still don't actually filter though,
> > that part is still missing.
> 
> Could you elaborate? I'm not sure I understand "interfaces for server-side
> filtering of change notifications". From what I gather, it means that
> clients can subscribe to the change notifications that they are interested
> in.

Exactly. This would allow the server to only monitor persistent search folder 
for changes if a client actually cares. Right now the server just broadcasts 
everything and clients filter themselves (cf. Akonadi::Monitor).

> 
> > However, as soon as KMail is running, someone would be watching these
> > folders.
> > How expensive is that?
> 
> No. Someone would only be watching that folder when that folder is open. Is
> that not the case?

The KMail folder tree shows unread counts and similar statistics for all mail 
folders. For that to be up to date we need to monitor for changes. There's a 
dedicate signal for statistics changes in the API, in case that's cheaper to 
query than what actually changed in the content. But I doubt that makes a 
difference here.

> From a Nepomuk point of view, all of this is super expensive. We have a
> very limited concept of change notifications. You can be notified when
> certain resources are added remove, and when certain properties are added /
> removed. That is all. We cannot know when the results of a particular
> search result have now changed.
> 
> The way we do it is by running the entire query again when data in the
> nepomuk db changes. Before 4.9.1, we used to do this everytime ANY data in
> the nepomuk db changed. Meaning that even when a file was indexed or the
> kactivitydeamon set some obscure value in the activity resource, the email
> query would be re-run.
> 
> After 4.9.1, we apply some heuristics are only try to run the query when
> the relevant data changes. In email search, email counts as relevant data.
> Therefore the query is re-run each time an email is indexed.
> 
> For small queries, this isn't a big problem, but it is a huge problem for
> queries with large results or complex queries that take some time
> processing.
> 
> So, now, how do we proceed? Do we completely disable real-time update of
> search collections?

Hm, this is very expensive indeed. Instead of degrading the entire system we 
should indeed disable or limit query re-running. Is there a way to manually 
trigger a re-run? I guess we could do that similar to on-demand syncing as a 
workaround until we find a better solution.

Live updating search folders are a killer feature I really would like to see 
eventually (not just for emails), so I'm wondering if we can make this work by 
introducing some constraints on the queries maybe, or by giving external 
entities (Akonadi in this case) more control over when a query is re-
evaluated. Some naive ideas:

- if we constrain queries to be "self-contained", ie. only depend on 
properties of a specific item, not properties of an item it relates to, it 
should be enough to evaluate the query on only the new/changed resource, 
rather than globally, right? E.g. "mails with tag foo" would work this way, 
"mails from people with tag foo" would not. I'd imagine this could bring a 
massive scalability improvement. Detection if a query complies with these 
constraints should even be possible automatically.

- compress/schedule query re-runs. For e.g. 100 mails to index in a batch we 
only need to re-run the query once. A 500 msec query would turn from unusable 
to quite fine this way I think. This works even better if users (such as 
Akonadi) can provide their knowledge about good timings for the re-run, e.g. 
after a mail sync has finished.

I know too little about Nepomuk internals to judge if this is feasible or 
would actually address the problem though.

regards,
Volker

> > On Sunday 09 December 2012 00:13:48 Andras Mantia wrote:
> > > Hi,
> > > 
> > > Vishesh Handa wrote:
> > > > Hey Laurent
> > > > 
> > > > Over the last couple of weeks, I've slowly started learning more about
> > > > Akonadi, and I've started using it. The good/bad news is that I have
> > > > started experiencing all the issues people complain about, so
> > 
> > hopefully I
> > 
> > > > can fix them.
> > > > 
> > > > I tried to search for an email using kmail, and it worked out fine.
> > > > However, I notice that kmail remembers some of the searches that were
> > > > performed. I know it uses the QueryServiceClient for this, I however
> > > > cannot find the relevant code. Could you perhaps point me towards it?
> > > > 
> > > > The way the QueryServiceClient is used is bad, cause we (Nepomuk) land
> > 
> > up
> > 
> > > > re-running the query each time an email is indexed. That makes
> > > > virtuoso
> > > > consume more cpu.
> > > 
> > > I'm not that familiar with the searches, but by looking at the code:
> > > - when a search performed a virtual collection is created with the
> > > search
> > > query
> > > - this is done using Akonadi::SearchCreateJob
> > > (kdepimlibs/akonadi/searchcreatejon.cpp)
> > > - the above triggers the virtual collection creation in the Akonadi
> > 
> > server.
> > 
> > > See akonadi/server/src/handler/searchpersistent.cpp
> > > - when a search is performed inside Akonadi, it goes through the
> > > akonadi/server/src/handler/search.cpp. That uses a class called
> > > NepomukSearch that uses the QueryServiceClient you mentioned.
> > > 
> > > What I don't know is how/when the virtual collection's content is
> > 
> > updated.
> > 
> > > I hope this gives you some clue where to look for the code your
> > > searching
> > > for.
> > > 
> > > Andras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20121216/feb6d00f/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list