[Kde-pim] Akonadi Nepomuk Email Searching

Sun Dec 16 18:12:46 GMT 2012

On Sun, Dec 16, 2012 at 6:12 PM, Volker Krause <vkrause at kde.org> wrote:

> >
> > > However, as soon as KMail is running, someone would be watching these
> > > folders.
> > > How expensive is that?
> >
> > No. Someone would only be watching that folder when that folder is open.
> Is
> > that not the case?
>
> The KMail folder tree shows unread counts and similar statistics for all
> mail
> folders. For that to be up to date we need to monitor for changes. There's
> a
> dedicate signal for statistics changes in the API, in case that's cheaper
> to
> query than what actually changed in the content. But I doubt that makes a
> difference here.
>

Yeah. Doesn't make much of a difference.

> >
> > So, now, how do we proceed? Do we completely disable real-time update of
> > search collections?
>
> Hm, this is very expensive indeed. Instead of degrading the entire system
> we
> should indeed disable or limit query re-running. Is there a way to manually
> trigger a re-run? I guess we could do that similar to on-demand syncing as
> a
> workaround until we find a better solution.
>

Not right now, but I can provide it if you want. I have been thinking of
disabling this automatic live queries and forcing the user to enable it by
default. I could possibly add a 'checkForUpdates()' function as well.

> Live updating search folders are a killer feature I really would like to
> see
> eventually (not just for emails), so I'm wondering if we can make this
> work by
> introducing some constraints on the queries maybe, or by giving external
> entities (Akonadi in this case) more control over when a query is re-
> evaluated. Some naive ideas:
>
> - if we constrain queries to be "self-contained", ie. only depend on
> properties of a specific item, not properties of an item it relates to, it
> should be enough to evaluate the query on only the new/changed resource,
> rather than globally, right? E.g. "mails with tag foo" would work this way,
> "mails from people with tag foo" would not. I'd imagine this could bring a
> massive scalability improvement. Detection if a query complies with these
> constraints should even be possible automatically.
>

Hmm. I'm not sure how we would go about this. Specially for queries
containing words like "search term".

- compress/schedule query re-runs. For e.g. 100 mails to index in a batch we
> only need to re-run the query once. A 500 msec query would turn from
> unusable
> to quite fine this way I think. This works even better if users (such as
> Akonadi) can provide their knowledge about good timings for the re-run,
> e.g.
> after a mail sync has finished.
>

This is completely feasible. As I said, how about a 'checkForNewUpdates()'
function. Or maybe a way to enable/disable live updates of queries. When
indexing emails, you can disable it and then later on enable it.

Though, I don't particularly like this approach either as it involves a
number of queries always running on Nepomuk startup.

> I know too little about Nepomuk internals to judge if this is feasible or
> would actually address the problem though.
>
> regards,
> Volker
>
> > > On Sunday 09 December 2012 00:13:48 Andras Mantia wrote:
> > > > Hi,
> > > >
> > > > Vishesh Handa wrote:
> > > > > Hey Laurent
> > > > >
> > > > > Over the last couple of weeks, I've slowly started learning more
> about
> > > > > Akonadi, and I've started using it. The good/bad news is that I
> have
> > > > > started experiencing all the issues people complain about, so
> > >
> > > hopefully I
> > >
> > > > > can fix them.
> > > > >
> > > > > I tried to search for an email using kmail, and it worked out fine.
> > > > > However, I notice that kmail remembers some of the searches that
> were
> > > > > performed. I know it uses the QueryServiceClient for this, I
> however
> > > > > cannot find the relevant code. Could you perhaps point me towards
> it?
> > > > >
> > > > > The way the QueryServiceClient is used is bad, cause we (Nepomuk)
> land
> > >
> > > up
> > >
> > > > > re-running the query each time an email is indexed. That makes
> > > > > virtuoso
> > > > > consume more cpu.
> > > >
> > > > I'm not that familiar with the searches, but by looking at the code:
> > > > - when a search performed a virtual collection is created with the
> > > > search
> > > > query
> > > > - this is done using Akonadi::SearchCreateJob
> > > > (kdepimlibs/akonadi/searchcreatejon.cpp)
> > > > - the above triggers the virtual collection creation in the Akonadi
> > >
> > > server.
> > >
> > > > See akonadi/server/src/handler/searchpersistent.cpp
> > > > - when a search is performed inside Akonadi, it goes through the
> > > > akonadi/server/src/handler/search.cpp. That uses a class called
> > > > NepomukSearch that uses the QueryServiceClient you mentioned.
> > > >
> > > > What I don't know is how/when the virtual collection's content is
> > >
> > > updated.
> > >
> > > > I hope this gives you some clue where to look for the code your
> > > > searching
> > > > for.
> > > >
> > > > Andras
>
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/
>

-- 
Vishesh Handa
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/