[Kde-pim] Akonadi Nepomuk Email Searching

Volker Krause vkrause at kde.org
Thu Dec 20 07:13:53 GMT 2012


On Sunday 16 December 2012 23:42:46 Vishesh Handa wrote:
> On Sun, Dec 16, 2012 at 6:12 PM, Volker Krause <vkrause at kde.org> wrote:
> > > > However, as soon as KMail is running, someone would be watching these
> > > > folders.
> > > > How expensive is that?
> > > 
> > > No. Someone would only be watching that folder when that folder is open.
> > 
> > Is
> > 
> > > that not the case?
> > 
> > The KMail folder tree shows unread counts and similar statistics for all
> > mail
> > folders. For that to be up to date we need to monitor for changes. There's
> > a
> > dedicate signal for statistics changes in the API, in case that's cheaper
> > to
> > query than what actually changed in the content. But I doubt that makes a
> > difference here.
> 
> Yeah. Doesn't make much of a difference.
> 
> > > So, now, how do we proceed? Do we completely disable real-time update of
> > > search collections?
> > 
> > Hm, this is very expensive indeed. Instead of degrading the entire system
> > we
> > should indeed disable or limit query re-running. Is there a way to
> > manually
> > trigger a re-run? I guess we could do that similar to on-demand syncing as
> > a
> > workaround until we find a better solution.
> 
> Not right now, but I can provide it if you want. I have been thinking of
> disabling this automatic live queries and forcing the user to enable it by
> default. I could possibly add a 'checkForUpdates()' function as well.
> 
> > Live updating search folders are a killer feature I really would like to
> > see
> > eventually (not just for emails), so I'm wondering if we can make this
> > work by
> > introducing some constraints on the queries maybe, or by giving external
> > entities (Akonadi in this case) more control over when a query is re-
> > evaluated. Some naive ideas:
> > 
> > - if we constrain queries to be "self-contained", ie. only depend on
> > properties of a specific item, not properties of an item it relates to, it
> > should be enough to evaluate the query on only the new/changed resource,
> > rather than globally, right? E.g. "mails with tag foo" would work this
> > way,
> > "mails from people with tag foo" would not. I'd imagine this could bring a
> > massive scalability improvement. Detection if a query complies with these
> > constraints should even be possible automatically.
> 
> Hmm. I'm not sure how we would go about this. Specially for queries
> containing words like "search term".

Do you mean queries like "emails containing the words 'Nepomuk rocks'"? That's 
fully self-contained, isn't it? Ie. the query result only depends on 
properties of the email resource itself.

> - compress/schedule query re-runs. For e.g. 100 mails to index in a batch we
> > only need to re-run the query once. A 500 msec query would turn from
> > unusable
> > to quite fine this way I think. This works even better if users (such as
> > Akonadi) can provide their knowledge about good timings for the re-run,
> > e.g.
> > after a mail sync has finished.
> 
> This is completely feasible. As I said, how about a 'checkForNewUpdates()'
> function. Or maybe a way to enable/disable live updates of queries. When
> indexing emails, you can disable it and then later on enable it.
> 
> Though, I don't particularly like this approach either as it involves a
> number of queries always running on Nepomuk startup.

That's maybe something we can fix for Akonadi though, since we cache the 
results (as kind of symlinks to the actual emails). So, we wouldn't 
necessarily need to re-run queries on startup. 

We do it right now for correctness (Nepomuk content and thus query results 
could change while Akonadi is off). This is a theoretical scenario though, it 
requires you to switch off Akonadi manually and using queries that are not 
self-contained. For self-contained queries you would need to be able to change 
the email itself, and you can't do that with Akonadi being off. So, I think we 
can avoid the extra query storm on startup, and just rely on the regular query 
scheduling.

regards,
Volker

> > > > On Sunday 09 December 2012 00:13:48 Andras Mantia wrote:
> > > > > Hi,
> > > > > 
> > > > > Vishesh Handa wrote:
> > > > > > Hey Laurent
> > > > > > 
> > > > > > Over the last couple of weeks, I've slowly started learning more
> > 
> > about
> > 
> > > > > > Akonadi, and I've started using it. The good/bad news is that I
> > 
> > have
> > 
> > > > > > started experiencing all the issues people complain about, so
> > > > 
> > > > hopefully I
> > > > 
> > > > > > can fix them.
> > > > > > 
> > > > > > I tried to search for an email using kmail, and it worked out
> > > > > > fine.
> > > > > > However, I notice that kmail remembers some of the searches that
> > 
> > were
> > 
> > > > > > performed. I know it uses the QueryServiceClient for this, I
> > 
> > however
> > 
> > > > > > cannot find the relevant code. Could you perhaps point me towards
> > 
> > it?
> > 
> > > > > > The way the QueryServiceClient is used is bad, cause we (Nepomuk)
> > 
> > land
> > 
> > > > up
> > > > 
> > > > > > re-running the query each time an email is indexed. That makes
> > > > > > virtuoso
> > > > > > consume more cpu.
> > > > > 
> > > > > I'm not that familiar with the searches, but by looking at the code:
> > > > > - when a search performed a virtual collection is created with the
> > > > > search
> > > > > query
> > > > > - this is done using Akonadi::SearchCreateJob
> > > > > (kdepimlibs/akonadi/searchcreatejon.cpp)
> > > > > - the above triggers the virtual collection creation in the Akonadi
> > > > 
> > > > server.
> > > > 
> > > > > See akonadi/server/src/handler/searchpersistent.cpp
> > > > > - when a search is performed inside Akonadi, it goes through the
> > > > > akonadi/server/src/handler/search.cpp. That uses a class called
> > > > > NepomukSearch that uses the QueryServiceClient you mentioned.
> > > > > 
> > > > > What I don't know is how/when the virtual collection's content is
> > > > 
> > > > updated.
> > > > 
> > > > > I hope this gives you some clue where to look for the code your
> > > > > searching
> > > > > for.
> > > > > 
> > > > > Andras
> > 
> > _______________________________________________
> > KDE PIM mailing list kde-pim at kde.org
> > https://mail.kde.org/mailman/listinfo/kde-pim
> > KDE PIM home page at http://pim.kde.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 190 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20121220/f3ad4651/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/


More information about the kde-pim mailing list