[Kde-pim] Nepomukfeeder Caching

Tue Dec 4 20:14:34 GMT 2012

On Tue, Dec 4, 2012, at 08:15 PM, Milian Wolff wrote:
> On Tuesday 04 December 2012 19:44:56 Christian Mollekopf wrote:
> > On Tue, Dec 4, 2012, at 07:19 PM, Vishesh Handa wrote:
> > > On Tue, Dec 4, 2012 at 11:24 PM, Christian Mollekopf
> > > 
> > > <chrigi_1 at fastmail.fm>wrote:
> > > > Hey,
> > > > 
> > > > I know you suggested little specific caches for the individual
> > > > properties, but IMO the PropertyCache solves the problem in a much more
> > > > elegant way. You may remember that I suggested doing the caching inside
> > > > nepomuk, the property cache is almost what you'd need for that, apart
> > > > from the code which figures out which properties to cache (I'm not
> > > > suggesting you should do that now).
> > > > 
> > > > What I especially like about it is that it takes like 4 lines of code in
> > > > a single place to add caching to the whole feeder, which could cache any
> > > > number of properties, so I'm really quite happy with the result. Also
> > > > the whole cache would be fully reusable.
> > > 
> > > The design seems pretty cool.
> > 
> > Thanks ;-)
> > 
> > > > I know I traded some tweakability and performance-costs for that design,
> > > > but as long as no real issues occur with it I consider it superior (it
> > > > seems to get the job done pretty well so far).
> > > > 
> > > > In any case, I know what you meant when you suggested caching and did it
> > > > deliberately otherwise =)
> > > > I'd be interested what you think about it anyways ;-)
> > > 
> > > For now, I think looking up the entire SimpleResource would be very
> > > expensive. It would consist of one hash, which would involve iterating
> > > over
> > > all the properties and values, and then doing and operator==, which would
> > > require another iteration over all the properties. This would involve a
> > > ton
> > > of QUrl and Soprano::Node comparisons. None of which are that cheap.
> > 
> > I thought so too at first, but figured I'd give it a shot anyways as I
> > misjudged the cost of certain operations before (or underestimated my
> > cpu). Fact is I don't even see the activity of the nepomukfeeder process
> > while indexing a bunch of mails, and virtuoso still maxes out a core.
> > This might be different for weaker cpu's but I think the hash
> > calculation is just not that expensive.
> > 
> > If somebody on an older machine could give it a shot that would be nice.
> > 
> > It's easy to test using the nepomukupimindexerutility in
> > kdepim-runtime/agents/nepomukfeeder/util:
> > * try master first:
> > * Select a bunch of messages
> > * right-click and select re-index
> > * look at the time it took
> > * checkout the branch with the cache
> > * retry the same set of messages
> > * watch the akonadi_nepomuk_feeder process for activity while indexing
> > * notice that only one third of the time was used to index
> 
> With my hobby-performance-wizard hat on, why did you not write a proper 
> QBENCHMARK for this scenario (yet)? Is it b/c the whole PIM stack is too 
> complicated to model? There are unit tests though, so writing a benchmark 
> should also be possible in theory, no?
> 

The unit-tests aren't really unittests as they work on the normal
akonadi/nepomuk databases and involve all parts of the system. I failed
to properly unit-test this code so far as it is pretty difficult to
decouple the functionality from akonadi and the alternative of running
separate akonadi/nepomuk instances isn't really attractive for all tests
either.

For the benchmark we'd have similar problems, and when talking about the
cache only, it would just be comparing the caches performance compared
to no caching, as the cost of not having the cache ends up on the
nepomuk side.

A benchmark which just benchmarks the roundtrip-time of indexing a
couple of items repeatedly would be possible, I was just to lazy to
write it so far / didn't see the need for it yet. It would essentially
automate the manual steps above.
That wouldn't give us any information about how the load between nepomuk
and the feeder has changed though.

I quickly ran the pimindexerutility through vtune with the following
result (I zoomed in on the indexing):
http://tinypic.com/r/o9pid4/6

As you can see, the akonadi framework is more of a performance problem
(due to the fetching of the items) than the hashing, so I think it's
safe to assume that the solution works good enough for the time being.
Note that also the spikes you see are akonadi related and have nothing
to do with the hashing.

There are still many other performance improvements in other places to
do ;-)

> The huge advantage is that you get comparable numbers, and can do proper 
> before/after judgements.
> 
> This is not to say that your investigation is wrong, if virtuoso still
> hogs up 
> the cpu and the other process doesn't even show much of a load spike, you 
> shouldn't start optimizing it (the usual 90% vs 10% mantra).
> 
> Cheers
> 
> > > I would ideally like to compare both the solutions, but that would
> > > involve
> > > coding both of them.
> > 
> > Unless there are actual problems with the current implementation you
> > can't trick me into that ;-)
> > 
> > > > Cheers,
> > > > Christian
> > > > 
> > > > On Tue, Dec 4, 2012, at 05:54 PM, Vishesh Handa wrote:
> > > > > On Tue, Dec 4, 2012 at 9:52 PM, Christian Mollekopf
> > > > > 
> > > > > <chrigi_1 at fastmail.fm>wrote:
> > > > > > > * The hashing function simply xor's the conjunction of the hashes
> > > > > > > of
> > > > 
> > > > uri
> > > > 
> > > > > > > and
> > > > > > > value of each resource-property together to calculate the hash of
> > > > > > > the
> > > > > > > full
> > > > > > > resource.  I don't know the math behind that but simply copied it
> > > > 
> > > > from
> > > > 
> > > > > > > nepomuk-core, input would be appreciated:
> > > > > > > propertycache.cpp:50
> > > > > > 
> > > > > > The hashing function looks like this:
> > > > > >     uint hash = 0;
> > > > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > > > >     while( it.hasNext() ) {
> > > > > >     
> > > > > >         it.next();
> > > > > >         hash ^= qHash( it.key() ) & qHash( it.value().toString() );
> > > > > >     
> > > > > >     }
> > > > > >     return hash;
> > > > > 
> > > > > I haven't looked at the code properly, but I'm surprised that you're
> > > > > hashing the entire SimpleResource, cause that kind misses the point of
> > > > > having application specific caches on the client side.
> > > > > 
> > > > > I was hoping you would have separate separate caches for each of
> > > > > these.
> > > > > 
> > > > > * nco:EmailAddress
> > > > > * nco:Contact
> > > > > * nao:FreeDesktopIcon
> > > > > * nmo:MessageHeader
> > > > > 
> > > > > for email, icon, and message header it will be a simple QHash<QString,
> > > > > QUrl>. You could convert the NepomukFeederUtils to a class (singleton,
> > > > > easier that way) and make it keep 4 separate caches. It could have
> > > > > similar
> > > > > functions to the one in the NepomukFeederUtils namespace, but they
> > > > > would
> > > > > use the cache.
> > > > > 
> > > > > You could also move addGraphToNepomuk from nepomukhelpers.cpp to it.
> > > > > Something like this -
> > > > > 
> > > > > class NepomukFeederUtils {
> > > > > 
> > > > > public:
> > > > >     void setIcon(const QString& iconName, Nepomuk2::SimpleResource&
> > > > >     res,
> > > > > 
> > > > > Nepomuk2::SimpleResourceGraph& graph) {
> > > > > 
> > > > >         if( m_iconCache.contains(iconName) ) {
> > > > >         
> > > > >             res.addProperty( NAO::prefSymbol(),
> > > > >             m_iconCache.value(iconName)
> > > > > 
> > > > > );
> > > > > 
> > > > >         }
> > > > >         else {
> > > > >         
> > > > >             SimpleResource icon;
> > > > >             icon.addType( NAO::FreeDesktopIcon() );
> > > > >             icon.setProperty( NAO::iconName, iconName );
> > > > >             res.addProperty( NAO::prefSymbol(), icon );
> > > > >             graph << icon;
> > > > >             
> > > > >             m_tempIconCache.insert( iconName, icon.uri() ); // The
> > > > >             icon
> > > > >             uri
> > > > > 
> > > > > will be of the form _:adfasd
> > > > > 
> > > > >         }
> > > > >     
> > > > >     }
> > > > >     
> > > > >     KJob* addGraphtoNepomuk( const SimpleResourceGraph & graph ) {
> > > > >     
> > > > >         KJob* job = graph.save();
> > > > >         connect( job, SIGNAL(finished(KJob*)), this,
> > > > > 
> > > > > SLOT(slotJobSaved(KJob*)) );
> > > > > 
> > > > >         return job;
> > > > >     
> > > > >     }
> > > > >     
> > > > >     void slotJobSaved(KJob* job_) {
> > > > >     
> > > > >         StoreResourcessJob* job =
> > > > >         static_cast<StoreResourcessJob*>(job_);
> > > > >         
> > > > >         QHashIterator<QUrl, QUrl> iter( job->mappings() );
> > > > >         while( iter.hasNext() ) {
> > > > >         
> > > > >             m_iconCache.insert( m_tempIconCache[iter.key()],
> > > > >             iter.value()
> > > > >             );
> > > > >         
> > > > >         }
> > > > >         m_tempIconCache.clear();
> > > > >     
> > > > >     }
> > > > > 
> > > > > private:
> > > > >     QHash<QString, QUrl> m_iconCache;
> > > > >     QHash<QString, QUrl> m_tempIconCache;
> > > > > 
> > > > > }
> > > > > 
> > > > > 
> > > > > uhh. Maybe I should look at the code. Maybe you would want to combine
> > > > > both
> > > > > the m_iconCache and m_tempIconCache into one, but then you can only
> > > > > run
> > > > > one
> > > > > job at a time.
> > > > > 
> > > > > > The (hash ^= newhash) should be ok, but I don't think the
> > > > > > conjunction
> > > > > > is, and I think it would make more sense to XOR that as well. That
> > > > > > the
> > > > > > hash is initialized with 0 doesn't hurt IMO as there is a 50/50
> > > > > > chance
> > > > > > we for 0 or 1 when using an XOR.
> > > > > > 
> > > > > > So I'll change this to:
> > > > > >     uint hash = 0;
> > > > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > > > >     while( it.hasNext() ) {
> > > > > >     
> > > > > >         it.next();
> > > > > >         hash ^= qHash( it.key() ) ^ qHash( it.value().toString() );
> > > > > >     
> > > > > >     }
> > > > > >     return hash;
> > > > > > 
> > > > > > Note that XOR is commutative, so the hash does not preserve the
> > > > > > information of order, but that should be fine for our usecase.
> > > > > > _______________________________________________
> > > > > > KDE PIM mailing list kde-pim at kde.org
> > > > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > > > KDE PIM home page at http://pim.kde.org/
> > > > > 
> > > > > --
> > > > > Vishesh Handa
> > > > > _______________________________________________
> > > > > KDE PIM mailing list kde-pim at kde.org
> > > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > > KDE PIM home page at http://pim.kde.org/
> > > > 
> > > > _______________________________________________
> > > > KDE PIM mailing list kde-pim at kde.org
> > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > KDE PIM home page at http://pim.kde.org/
> > 
> > _______________________________________________
> > KDE PIM mailing list kde-pim at kde.org
> > https://mail.kde.org/mailman/listinfo/kde-pim
> > KDE PIM home page at http://pim.kde.org/
> -- 
> Milian Wolff
> mail at milianw.de
> http://milianw.de
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/