[Kde-pim] Nepomukfeeder updates almost ready

Christian Mollekopf chrigi_1 at fastmail.fm
Sun Dec 23 16:54:18 GMT 2012


Heya,

To cut right to the chase; I revamped the feeders a bit, think it's much 
better than what we had before, and would like to get it into 4.10. So feel 
free to skip if you don't care. 

I moved to a recurring, query based approach for the initial-indexing. That 
means, instead of doing a single initial-indexing when the feeder is executed 
the first time, and relying purely on updates from the change-recorder 
afterwards, the initial-indexing is now more a maintenance task (which is 
currently running on every start), and queries for all not yet indexed items.

That is necessary, as the initial assumption that we can index items faster 
than notifications come in didn't hold true, which resulted in the feeder 
regularly being overloaded with stuff to index.

The initial query approach resulted in n queries for n items, which is way too 
slow to be feasible for all items (it is taking ages, literally). The only 
alternative approach I found is; we run two queries, one in akonadi and one in 
nepomuk, each querying for *all* available items. Comparing the two lists, 
results in the list of items which have not been indexed yet. Of course, that 
misses any changes on items which have been indexed before, but have been 
modified since then, so it's not ideal either.
These queries are fairly efficient as they result in a single sql query per db 
(as opposed to n),  although with a huge result set. I could query my db of 
~100'000 items in ~20s (i7 processor).

Since I figured changes on emails, which are mostly just flags, are 
negligible, I switched the email initial-indexing to that new approach.

Non-email items continue to be indexed as usual, meaning there is one query 
per item, which allows us to detect modifications as well. That is slow as 
usual, but since we usually have a lot more email items than non-email items, 
it works well enough.

Another important advantage is that we can thus now also skip large batches of 
new/changed items, knowing they will be picked up by the initial-indexing 
eventually. That also allows us to turn off the change-recorder when the 
feeder is turned off (which is another problem if we rely on the change-
recorder too much).

One remaining problem is that we get loads of notifications of changed/added 
items, which I think are mostly due to sync-on-demand updates, updating the 
cache (and not actual new emails or whatnot). I also often get flag change 
notifications on my offline imap accounts, which I don't really know why yet. 
That of course would lead to loads of items being indexed over and over again, 
but that can be mitigated somewhat since we now can skip larger batches of 
items.

Besides I made some performance improvements, such as the cache I mentioned 
previously (200% performance boost), or that new items are now indexed without 
any queries, which gives another boost of 10%-20% or so.

Overall, I think we should get this into 4.10 as fast as possible. The patch 
is somewhat large (and way to late in the process), but IMO the previous 
feeders are broken enough to justify this. So what do you think? Should I 
commit this to 4.10 in a couple of commits, or only master and then backport 
it for 4.10.1?

Cheers,
Christian
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/



More information about the kde-pim mailing list