[Kde-pim] Updating persistent search

Mon Dec 16 08:39:56 GMT 2013

On Saturday 14 December 2013 18.35:18 Daniel Vrátil wrote:
> Hi folks,
> 

Hey Dan,

> I started adding support for agent search (so that we can query for example
> IMAP server when using online IMAP and emails are not indexed in Nepomuk)
> and adding support for search plugins (so that we don't have to copy-paste
> parts of Baloo into Akonadi like with Nepomuk).
> 
> Now I'm working on the persistent searches and ran into a problem with
> updating the search folders. If the query contains a match against a header
> that's not indexed by Nepomuk, or when we have online IMAP and there's a
> match against content of the message, we have a problem, as we won't get
> complete results.
> 

Are you referring to the situation where an item matches the query that is 
only partially available locally?

> I want to avoid having incomplete results, because it makes the entire
> feature look broken and it's confusing to users ("huh, why did my email
> suddenly appear in the search folder after I read it?").
> 

This seems to refer to keeping queries open so new items are added as they 
become available?

> 
> Proposed solutions:
> 
> A) Don't support persistent search with matches against non-standard headers
> and body content
> 
> PROS: we can guarantee complete results by re-running all queries every time
> an item is added or modified (and optimize it by enqueueing the items and
> running the queries after some timeout, so that we get some reasonable
> performance when fetching many new emails, or when mass-changing flags.
> With Baloo, the queries are reasonably fast and won't cause your computer
> to self- ignite)
> 
> CONS: it removes a feature and makes the persistent search less useful
> 
> 
> 
> B) Don't support persistent search with matches against body content
> 
> PROS: see A)
> 
> CONS: see A) + requires indexing all headers into Nepomuk, which will make
> it bigger and will slow down the queries. Also I don't know whether Vishesh
> is actually willing to do that.
> 
> 
> 
> C) Support persistent search with matches against non-standard headers and
> 	body content
> 
> PROS: we don't lose any features
> 
> CONS: see B) + we have to also use the agent search feature, which requires
> the resource to talk to remote server, getting results and pushing them back
> to Akonadi. This is not just very slow, but also consumes a lot of
> bandwidth, eats some additional CPU and memory and does not work when
> computer is offline. The offline problem makes it the least preferred
> solution to me, as it would cause the results to be incomplete (at least
> until we are back online, and can run all queries again).
> 
> 
> 
> D) Don't support updating search folders
> 
> PROS: we run the query when the folder is created, which allows us to use
> even the agent-search feature to handle non-standard headers and body
> search and to get complete results
> 
> CONS: the search folder will become a snapshot of results in a certain point
> in time
> 
> 
> I would personally go with A) or D). Both guarantee complete results while
> keeping the code simple and run-time resources at reasonable level. D) is
> also the current state of the persistent search in stable, because
> auto-updating has been disabled due to updates being to expensive with
> Nepomuk. It's however considered a regression and a temporary measure. If
> we want to go with D), we have to communicate to users somehow that it's no
> longer a regression, but the way things work.
> 
> 

What I would expect is:
* local cache is tried first (if we expect this to be faster), agent search is 
the fallback
* body can be searched in either case (are you saying we're no longer indexing 
the body?)
* If the search is configured to update itself, new item appear in the result 
set as the necessary parts are indexed

The auto-updating queries are a nice feature, and with the local indexing the 
server can test only the modified/new items to implement this while not 
becoming a complete performance hog. So IMO that would be pretty cool, but 
doesn't have to be part of the first iteration.
In any case it probably makes sense to allow the application to control what 
it wants (snapshot or auto-updating).

Since Akonadi is content agnostics you shouldn't care about which parts are 
locally available, but just evaluate what's there. IMO it should be up to the 
agent or application to decide such things. For instance, the whole header vs. 
body discussion doesn't really make sense for calendar items as all 
information only becomes available with the body.

Cheers,
Christian
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/