[Kde-pim] More Nepomuk Feeder Improvements

Wed May 29 12:14:04 BST 2013

On Wednesday 29 May 2013 14.47:07 Vishesh Handa wrote:
> Hey Christian
> 

Hey Vishesh,

> I've made some more improvements to the nepomuk feeder. Most of it is
> simple stuff like scheduling the operations better, and reacting to config
> changes. Others are just simple cleanups.
> 
> The biggest change is probably that I have remove the half-hour and hourly
> checks for emails. 
> Also I've removed the whole concept of batch indexing.
> 

Not a good idea IMO

That code has been there to detect mass insertions of new items.

Let's say you get notifications for 300'000 items within a minute or so (I just 
added my email account), and then get another 300'000 notifications because I 
just removed my other account.

That leads to:
* huge queues (shouldn't be a big deal, but maybe only store the item id's 
instead of the full items for the ones to remove)
* all items end up in the high prio queue

=> The feeder is utterly useless until you restart because it will be busy 
doing stuff that it should do in background processing (and that is not really 
really relevant atm.)

The code before prevented that by simply skipping the batch, so we don't 
mistake mass changes for actually relevant stuff (Because I don't care if my 
email account takes a day to index, but I want the note I just added to be 
immediately available in search)

The regular FindUnindexedJob, which was only executed if such a batch was 
actually added and items were skipped, so not during normal operation, would 
then retrieve the skipped items again, properly scheduling them as background 
processing work.

I'd rather not loose that.

> Could you please have a look? If everything is okay, then I'll merge into
> master.
> 
> I still have some more changes I would like to do in the feeder. I'm still
> not satisfied with the speed or how the jobs are scheduled.

* > commit b27b6ddf9bd4da656c3f44aa4281235554671739
> Author: Vishesh Handa <me at vhanda.in>
> Date:   Fri May 24 20:42:01 2013 +0530
> 
>     Nepomuk Feeder Agent: Send new collections to the Scheduler
>     
>     Instead of indexing them yourself. Also, we should index the contents of
>     the collection as well.

You get item added signals for the items, so I don't think you need to add the 
collection to the scheduler.

The collection indexing codepath really only exists anymore for manual 
indexing of a collection.

* I'd generally avoid using full Akonadi::Item in any list as these can grow 
huge and the Item::Id should be enough.

* IndexScheduler::setReindexing only bypasses the check if an item has already 
been indexed. It shouldn't call a anything.

If it's not set, indexing a collection is much faster if most of the items 
have already been indexed, if set each item is indexed again regardless if 
it's already in nepomuk or not.  

The whole collection indexing codepath used to exist to check each item if 
it's indexed or not (one query per item). 
Now that we have the FindUnindexedJob it's only for manual indexing. I'd still 
keep it around for that though.

Cheers,
Christian

_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/