[Kde-pim] Nepomukfeeder updates almost ready
Christian Mollekopf
chrigi_1 at fastmail.fm
Sun Dec 23 16:54:18 GMT 2012
Heya,
To cut right to the chase; I revamped the feeders a bit, think it's much
better than what we had before, and would like to get it into 4.10. So feel
free to skip if you don't care.
I moved to a recurring, query based approach for the initial-indexing. That
means, instead of doing a single initial-indexing when the feeder is executed
the first time, and relying purely on updates from the change-recorder
afterwards, the initial-indexing is now more a maintenance task (which is
currently running on every start), and queries for all not yet indexed items.
That is necessary, as the initial assumption that we can index items faster
than notifications come in didn't hold true, which resulted in the feeder
regularly being overloaded with stuff to index.
The initial query approach resulted in n queries for n items, which is way too
slow to be feasible for all items (it is taking ages, literally). The only
alternative approach I found is; we run two queries, one in akonadi and one in
nepomuk, each querying for *all* available items. Comparing the two lists,
results in the list of items which have not been indexed yet. Of course, that
misses any changes on items which have been indexed before, but have been
modified since then, so it's not ideal either.
These queries are fairly efficient as they result in a single sql query per db
(as opposed to n), although with a huge result set. I could query my db of
~100'000 items in ~20s (i7 processor).
Since I figured changes on emails, which are mostly just flags, are
negligible, I switched the email initial-indexing to that new approach.
Non-email items continue to be indexed as usual, meaning there is one query
per item, which allows us to detect modifications as well. That is slow as
usual, but since we usually have a lot more email items than non-email items,
it works well enough.
Another important advantage is that we can thus now also skip large batches of
new/changed items, knowing they will be picked up by the initial-indexing
eventually. That also allows us to turn off the change-recorder when the
feeder is turned off (which is another problem if we rely on the change-
recorder too much).
One remaining problem is that we get loads of notifications of changed/added
items, which I think are mostly due to sync-on-demand updates, updating the
cache (and not actual new emails or whatnot). I also often get flag change
notifications on my offline imap accounts, which I don't really know why yet.
That of course would lead to loads of items being indexed over and over again,
but that can be mitigated somewhat since we now can skip larger batches of
items.
Besides I made some performance improvements, such as the cache I mentioned
previously (200% performance boost), or that new items are now indexed without
any queries, which gives another boost of 10%-20% or so.
Overall, I think we should get this into 4.10 as fast as possible. The patch
is somewhat large (and way to late in the process), but IMO the previous
feeders are broken enough to justify this. So what do you think? Should I
commit this to 4.10 in a couple of commits, or only master and then backport
it for 4.10.1?
Cheers,
Christian
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/
More information about the kde-pim
mailing list