[Kde-pim] Nepomukfeeder updates almost ready

Fri Dec 28 09:22:08 GMT 2012

I'm all for leveraging developers decisions above cycles, so if the
maintainer thinks that the new code is better go with it and be ready to
possible bugs fast :p

Cheers
On Dec 27, 2012 10:06 PM, "Allen Winter" <winter at kde.org> wrote:

> On Wednesday 26 December 2012 04:44:53 PM Christian Mollekopf wrote:
> > Hey,
> >
> > I made another bunch of fixes, turned the finding of skipped items into a
> > recurring task, and turn the change-recorder off now if the feeder is
> disabled
> > entirely. In my testing so far this system behaves much better than what
> we
> > used to have.
> >
> > I plan on committing this to 4.10 if noone objects within the next days.
> (I'll
> > write a mail to release-team first).
> >
> > The code is here:
> > http://quickgit.kde.org/?p=clones%2Fkdepim-
> >
> runtime%2Fcmollekopf%2FpimRuntimeClone.git&a=shortlog&h=c2ca91566953c57af119634f65b5bd73bac7e7fa
> >
> > Cheers,
> > Christian
> >
> >
> > On Sunday 23 December 2012 17.54:18 Christian Mollekopf wrote:
> > > Heya,
> > >
> > > To cut right to the chase; I revamped the feeders a bit, think it's
> much
> > > better than what we had before, and would like to get it into 4.10. So
> feel
> > > free to skip if you don't care.
> > >
> > > I moved to a recurring, query based approach for the initial-indexing.
> That
> > > means, instead of doing a single initial-indexing when the feeder is
> > > executed the first time, and relying purely on updates from the
> > > change-recorder afterwards, the initial-indexing is now more a
> maintenance
> > > task (which is currently running on every start), and queries for all
> not
> > > yet indexed items.
> > >
> > > That is necessary, as the initial assumption that we can index items
> faster
> > > than notifications come in didn't hold true, which resulted in the
> feeder
> > > regularly being overloaded with stuff to index.
> > >
> > > The initial query approach resulted in n queries for n items, which is
> way
> > > too slow to be feasible for all items (it is taking ages, literally).
> The
> > > only alternative approach I found is; we run two queries, one in
> akonadi
> > > and one in nepomuk, each querying for *all* available items. Comparing
> the
> > > two lists, results in the list of items which have not been indexed
> yet. Of
> > > course, that misses any changes on items which have been indexed
> before,
> > > but have been modified since then, so it's not ideal either.
> > > These queries are fairly efficient as they result in a single sql
> query per
> > > db (as opposed to n),  although with a huge result set. I could query
> my db
> > > of ~100'000 items in ~20s (i7 processor).
> > >
> > > Since I figured changes on emails, which are mostly just flags, are
> > > negligible, I switched the email initial-indexing to that new approach.
> > >
> > > Non-email items continue to be indexed as usual, meaning there is one
> query
> > > per item, which allows us to detect modifications as well. That is
> slow as
> > > usual, but since we usually have a lot more email items than non-email
> > > items, it works well enough.
> > >
> > > Another important advantage is that we can thus now also skip large
> batches
> > > of new/changed items, knowing they will be picked up by the
> > > initial-indexing eventually. That also allows us to turn off the
> > > change-recorder when the feeder is turned off (which is another
> problem if
> > > we rely on the change- recorder too much).
> > >
> > > One remaining problem is that we get loads of notifications of
> changed/added
> > > items, which I think are mostly due to sync-on-demand updates,
> updating the
> > > cache (and not actual new emails or whatnot). I also often get flag
> change
> > > notifications on my offline imap accounts, which I don't really know
> why
> > > yet. That of course would lead to loads of items being indexed over and
> > > over again, but that can be mitigated somewhat since we now can skip
> larger
> > > batches of items.
> > >
> > > Besides I made some performance improvements, such as the cache I
> mentioned
> > > previously (200% performance boost), or that new items are now indexed
> > > without any queries, which gives another boost of 10%-20% or so.
> > >
> > > Overall, I think we should get this into 4.10 as fast as possible. The
> patch
> > > is somewhat large (and way to late in the process), but IMO the
> previous
> > > feeders are broken enough to justify this. So what do you think?
> Should I
> > > commit this to 4.10 in a couple of commits, or only master and then
> > > backport it for 4.10.1?
> > >
>
> Are there any objections to getting this work committed for 4.10?
> It's awfully late in the release cycle to be pushing for this, but I will
> do so if I get warm-fuzzies from a couple more folks that we need it.
>
> Anyone want to chime in here?
> Please do so ASAP.
>
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/
>
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/