[Kde-pim] Nepomukfeeder Caching

Tue Dec 4 19:15:49 GMT 2012

On Tuesday 04 December 2012 19:44:56 Christian Mollekopf wrote:
> On Tue, Dec 4, 2012, at 07:19 PM, Vishesh Handa wrote:
> > On Tue, Dec 4, 2012 at 11:24 PM, Christian Mollekopf
> > 
> > <chrigi_1 at fastmail.fm>wrote:
> > > Hey,
> > > 
> > > I know you suggested little specific caches for the individual
> > > properties, but IMO the PropertyCache solves the problem in a much more
> > > elegant way. You may remember that I suggested doing the caching inside
> > > nepomuk, the property cache is almost what you'd need for that, apart
> > > from the code which figures out which properties to cache (I'm not
> > > suggesting you should do that now).
> > > 
> > > What I especially like about it is that it takes like 4 lines of code in
> > > a single place to add caching to the whole feeder, which could cache any
> > > number of properties, so I'm really quite happy with the result. Also
> > > the whole cache would be fully reusable.
> > 
> > The design seems pretty cool.
> 
> Thanks ;-)
> 
> > > I know I traded some tweakability and performance-costs for that design,
> > > but as long as no real issues occur with it I consider it superior (it
> > > seems to get the job done pretty well so far).
> > > 
> > > In any case, I know what you meant when you suggested caching and did it
> > > deliberately otherwise =)
> > > I'd be interested what you think about it anyways ;-)
> > 
> > For now, I think looking up the entire SimpleResource would be very
> > expensive. It would consist of one hash, which would involve iterating
> > over
> > all the properties and values, and then doing and operator==, which would
> > require another iteration over all the properties. This would involve a
> > ton
> > of QUrl and Soprano::Node comparisons. None of which are that cheap.
> 
> I thought so too at first, but figured I'd give it a shot anyways as I
> misjudged the cost of certain operations before (or underestimated my
> cpu). Fact is I don't even see the activity of the nepomukfeeder process
> while indexing a bunch of mails, and virtuoso still maxes out a core.
> This might be different for weaker cpu's but I think the hash
> calculation is just not that expensive.
> 
> If somebody on an older machine could give it a shot that would be nice.
> 
> It's easy to test using the nepomukupimindexerutility in
> kdepim-runtime/agents/nepomukfeeder/util:
> * try master first:
> * Select a bunch of messages
> * right-click and select re-index
> * look at the time it took
> * checkout the branch with the cache
> * retry the same set of messages
> * watch the akonadi_nepomuk_feeder process for activity while indexing
> * notice that only one third of the time was used to index

With my hobby-performance-wizard hat on, why did you not write a proper 
QBENCHMARK for this scenario (yet)? Is it b/c the whole PIM stack is too 
complicated to model? There are unit tests though, so writing a benchmark 
should also be possible in theory, no?

The huge advantage is that you get comparable numbers, and can do proper 
before/after judgements.

This is not to say that your investigation is wrong, if virtuoso still hogs up 
the cpu and the other process doesn't even show much of a load spike, you 
shouldn't start optimizing it (the usual 90% vs 10% mantra).

Cheers

> > I would ideally like to compare both the solutions, but that would
> > involve
> > coding both of them.
> 
> Unless there are actual problems with the current implementation you
> can't trick me into that ;-)
> 
> > > Cheers,
> > > Christian
> > > 
> > > On Tue, Dec 4, 2012, at 05:54 PM, Vishesh Handa wrote:
> > > > On Tue, Dec 4, 2012 at 9:52 PM, Christian Mollekopf
> > > > 
> > > > <chrigi_1 at fastmail.fm>wrote:
> > > > > > * The hashing function simply xor's the conjunction of the hashes
> > > > > > of
> > > 
> > > uri
> > > 
> > > > > > and
> > > > > > value of each resource-property together to calculate the hash of
> > > > > > the
> > > > > > full
> > > > > > resource.  I don't know the math behind that but simply copied it
> > > 
> > > from
> > > 
> > > > > > nepomuk-core, input would be appreciated:
> > > > > > propertycache.cpp:50
> > > > > 
> > > > > The hashing function looks like this:
> > > > >     uint hash = 0;
> > > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > > >     while( it.hasNext() ) {
> > > > >     
> > > > >         it.next();
> > > > >         hash ^= qHash( it.key() ) & qHash( it.value().toString() );
> > > > >     
> > > > >     }
> > > > >     return hash;
> > > > 
> > > > I haven't looked at the code properly, but I'm surprised that you're
> > > > hashing the entire SimpleResource, cause that kind misses the point of
> > > > having application specific caches on the client side.
> > > > 
> > > > I was hoping you would have separate separate caches for each of
> > > > these.
> > > > 
> > > > * nco:EmailAddress
> > > > * nco:Contact
> > > > * nao:FreeDesktopIcon
> > > > * nmo:MessageHeader
> > > > 
> > > > for email, icon, and message header it will be a simple QHash<QString,
> > > > QUrl>. You could convert the NepomukFeederUtils to a class (singleton,
> > > > easier that way) and make it keep 4 separate caches. It could have
> > > > similar
> > > > functions to the one in the NepomukFeederUtils namespace, but they
> > > > would
> > > > use the cache.
> > > > 
> > > > You could also move addGraphToNepomuk from nepomukhelpers.cpp to it.
> > > > Something like this -
> > > > 
> > > > class NepomukFeederUtils {
> > > > 
> > > > public:
> > > >     void setIcon(const QString& iconName, Nepomuk2::SimpleResource&
> > > >     res,
> > > > 
> > > > Nepomuk2::SimpleResourceGraph& graph) {
> > > > 
> > > >         if( m_iconCache.contains(iconName) ) {
> > > >         
> > > >             res.addProperty( NAO::prefSymbol(),
> > > >             m_iconCache.value(iconName)
> > > > 
> > > > );
> > > > 
> > > >         }
> > > >         else {
> > > >         
> > > >             SimpleResource icon;
> > > >             icon.addType( NAO::FreeDesktopIcon() );
> > > >             icon.setProperty( NAO::iconName, iconName );
> > > >             res.addProperty( NAO::prefSymbol(), icon );
> > > >             graph << icon;
> > > >             
> > > >             m_tempIconCache.insert( iconName, icon.uri() ); // The
> > > >             icon
> > > >             uri
> > > > 
> > > > will be of the form _:adfasd
> > > > 
> > > >         }
> > > >     
> > > >     }
> > > >     
> > > >     KJob* addGraphtoNepomuk( const SimpleResourceGraph & graph ) {
> > > >     
> > > >         KJob* job = graph.save();
> > > >         connect( job, SIGNAL(finished(KJob*)), this,
> > > > 
> > > > SLOT(slotJobSaved(KJob*)) );
> > > > 
> > > >         return job;
> > > >     
> > > >     }
> > > >     
> > > >     void slotJobSaved(KJob* job_) {
> > > >     
> > > >         StoreResourcessJob* job =
> > > >         static_cast<StoreResourcessJob*>(job_);
> > > >         
> > > >         QHashIterator<QUrl, QUrl> iter( job->mappings() );
> > > >         while( iter.hasNext() ) {
> > > >         
> > > >             m_iconCache.insert( m_tempIconCache[iter.key()],
> > > >             iter.value()
> > > >             );
> > > >         
> > > >         }
> > > >         m_tempIconCache.clear();
> > > >     
> > > >     }
> > > > 
> > > > private:
> > > >     QHash<QString, QUrl> m_iconCache;
> > > >     QHash<QString, QUrl> m_tempIconCache;
> > > > 
> > > > }
> > > > 
> > > > 
> > > > uhh. Maybe I should look at the code. Maybe you would want to combine
> > > > both
> > > > the m_iconCache and m_tempIconCache into one, but then you can only
> > > > run
> > > > one
> > > > job at a time.
> > > > 
> > > > > The (hash ^= newhash) should be ok, but I don't think the
> > > > > conjunction
> > > > > is, and I think it would make more sense to XOR that as well. That
> > > > > the
> > > > > hash is initialized with 0 doesn't hurt IMO as there is a 50/50
> > > > > chance
> > > > > we for 0 or 1 when using an XOR.
> > > > > 
> > > > > So I'll change this to:
> > > > >     uint hash = 0;
> > > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > > >     while( it.hasNext() ) {
> > > > >     
> > > > >         it.next();
> > > > >         hash ^= qHash( it.key() ) ^ qHash( it.value().toString() );
> > > > >     
> > > > >     }
> > > > >     return hash;
> > > > > 
> > > > > Note that XOR is commutative, so the hash does not preserve the
> > > > > information of order, but that should be fine for our usecase.
> > > > > _______________________________________________
> > > > > KDE PIM mailing list kde-pim at kde.org
> > > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > > KDE PIM home page at http://pim.kde.org/
> > > > 
> > > > --
> > > > Vishesh Handa
> > > > _______________________________________________
> > > > KDE PIM mailing list kde-pim at kde.org
> > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > KDE PIM home page at http://pim.kde.org/
> > > 
> > > _______________________________________________
> > > KDE PIM mailing list kde-pim at kde.org
> > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > KDE PIM home page at http://pim.kde.org/
> 
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/
-- 
Milian Wolff
mail at milianw.de
http://milianw.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-pim/attachments/20121204/ed2e5e8b/attachment.sig>
-------------- next part --------------
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/