[Kde-pim] Nepomukfeeder Caching

Tue Dec 4 21:38:10 GMT 2012

On Tuesday 04 December 2012 21:14:34 Christian Mollekopf wrote:
> On Tue, Dec 4, 2012, at 08:15 PM, Milian Wolff wrote:
> > On Tuesday 04 December 2012 19:44:56 Christian Mollekopf wrote:
> > > On Tue, Dec 4, 2012, at 07:19 PM, Vishesh Handa wrote:
> > > > On Tue, Dec 4, 2012 at 11:24 PM, Christian Mollekopf
> > > > 
> > > > <chrigi_1 at fastmail.fm>wrote:
> > > > > Hey,
> > > > > 
> > > > > I know you suggested little specific caches for the individual
> > > > > properties, but IMO the PropertyCache solves the problem in a much
> > > > > more
> > > > > elegant way. You may remember that I suggested doing the caching
> > > > > inside
> > > > > nepomuk, the property cache is almost what you'd need for that,
> > > > > apart
> > > > > from the code which figures out which properties to cache (I'm not
> > > > > suggesting you should do that now).
> > > > > 
> > > > > What I especially like about it is that it takes like 4 lines of
> > > > > code in
> > > > > a single place to add caching to the whole feeder, which could cache
> > > > > any
> > > > > number of properties, so I'm really quite happy with the result.
> > > > > Also
> > > > > the whole cache would be fully reusable.
> > > > 
> > > > The design seems pretty cool.
> > > 
> > > Thanks ;-)
> > > 
> > > > > I know I traded some tweakability and performance-costs for that
> > > > > design,
> > > > > but as long as no real issues occur with it I consider it superior
> > > > > (it
> > > > > seems to get the job done pretty well so far).
> > > > > 
> > > > > In any case, I know what you meant when you suggested caching and
> > > > > did it
> > > > > deliberately otherwise =)
> > > > > I'd be interested what you think about it anyways ;-)
> > > > 
> > > > For now, I think looking up the entire SimpleResource would be very
> > > > expensive. It would consist of one hash, which would involve iterating
> > > > over
> > > > all the properties and values, and then doing and operator==, which
> > > > would
> > > > require another iteration over all the properties. This would involve
> > > > a
> > > > ton
> > > > of QUrl and Soprano::Node comparisons. None of which are that cheap.
> > > 
> > > I thought so too at first, but figured I'd give it a shot anyways as I
> > > misjudged the cost of certain operations before (or underestimated my
> > > cpu). Fact is I don't even see the activity of the nepomukfeeder process
> > > while indexing a bunch of mails, and virtuoso still maxes out a core.
> > > This might be different for weaker cpu's but I think the hash
> > > calculation is just not that expensive.
> > > 
> > > If somebody on an older machine could give it a shot that would be nice.
> > > 
> > > It's easy to test using the nepomukupimindexerutility in
> > > kdepim-runtime/agents/nepomukfeeder/util:
> > > * try master first:
> > > * Select a bunch of messages
> > > * right-click and select re-index
> > > * look at the time it took
> > > * checkout the branch with the cache
> > > * retry the same set of messages
> > > * watch the akonadi_nepomuk_feeder process for activity while indexing
> > > * notice that only one third of the time was used to index
> > 
> > With my hobby-performance-wizard hat on, why did you not write a proper
> > QBENCHMARK for this scenario (yet)? Is it b/c the whole PIM stack is too
> > complicated to model? There are unit tests though, so writing a benchmark
> > should also be possible in theory, no?
> 
> The unit-tests aren't really unittests as they work on the normal
> akonadi/nepomuk databases and involve all parts of the system. I failed
> to properly unit-test this code so far as it is pretty difficult to
> decouple the functionality from akonadi and the alternative of running
> separate akonadi/nepomuk instances isn't really attractive for all tests
> either.

You most certainly don't want to work on your "normal" databases, see:

http://techbase.kde.org/Projects/PIM/Akonadi/Testing

Too bad there is nothing about this "Akonadi Benchmarker" in there - would be 
potentially interesting. Maybe someone from the PIM team has more information 
on this?

> For the benchmark we'd have similar problems, and when talking about the
> cache only, it would just be comparing the caches performance compared
> to no caching, as the cost of not having the cache ends up on the
> nepomuk side.
> 
> A benchmark which just benchmarks the roundtrip-time of indexing a
> couple of items repeatedly would be possible, I was just to lazy to
> write it so far / didn't see the need for it yet. It would essentially
> automate the manual steps above.
> That wouldn't give us any information about how the load between nepomuk
> and the feeder has changed though.

If the load is less, the roundtrip should be done in less time, no? I 
personally think it would be a very good idea to have such a benchmark. 
Especially considering this:

> I quickly ran the pimindexerutility through vtune with the following
> result (I zoomed in on the indexing):
> http://tinypic.com/r/o9pid4/6

You should *filter* in on the indexing, e.g. mark one/both "CPU Usage" peaks 
and look at them separately. But I think these spikes are far too small to get 
useful output from a sampling based profiler like VTune (the spikes look to be 
~200ms wide so that means ~20 samples).

Anyhow, that just shows that it's probably not worth spending time on 
optimizing the hash function right now.

And as I said above, with a proper benchmark you could increase the size of 
the problem until you get a reasonable timeframe which can be used to get 
meaningful results from the sampling statistics.

> As you can see, the akonadi framework is more of a performance problem
> (due to the fetching of the items) than the hashing, so I think it's
> safe to assume that the solution works good enough for the time being.
> Note that also the spikes you see are akonadi related and have nothing
> to do with the hashing.

Where do you see akonadi in the above? The 999ms spent in main are probably 
the event loop (also compare to the CPU Usage below). I don't know what 
exactly happens, but it could quite probably indicate that the time is just 
spent on waiting for a result from nepomuk/whatever, and not really akonadi 
(since there is no CPU activity).

> There are still many other performance improvements in other places to
> do ;-)

I bet thats true :)

Cheers

-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20121204/b2a9b4eb/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/