[Kde-pim] Nepomukfeeder Caching

Tue Dec 4 18:44:56 GMT 2012

On Tue, Dec 4, 2012, at 07:19 PM, Vishesh Handa wrote:
> On Tue, Dec 4, 2012 at 11:24 PM, Christian Mollekopf
> <chrigi_1 at fastmail.fm>wrote:
> 
> > Hey,
> >
> > I know you suggested little specific caches for the individual
> > properties, but IMO the PropertyCache solves the problem in a much more
> > elegant way. You may remember that I suggested doing the caching inside
> > nepomuk, the property cache is almost what you'd need for that, apart
> > from the code which figures out which properties to cache (I'm not
> > suggesting you should do that now).
> >
> > What I especially like about it is that it takes like 4 lines of code in
> > a single place to add caching to the whole feeder, which could cache any
> > number of properties, so I'm really quite happy with the result. Also
> > the whole cache would be fully reusable.
> >
> 
> The design seems pretty cool.
> 
Thanks ;-)
> 
> >
> > I know I traded some tweakability and performance-costs for that design,
> > but as long as no real issues occur with it I consider it superior (it
> > seems to get the job done pretty well so far).
> >
> > In any case, I know what you meant when you suggested caching and did it
> > deliberately otherwise =)
> > I'd be interested what you think about it anyways ;-)
> >
> 
> For now, I think looking up the entire SimpleResource would be very
> expensive. It would consist of one hash, which would involve iterating
> over
> all the properties and values, and then doing and operator==, which would
> require another iteration over all the properties. This would involve a
> ton
> of QUrl and Soprano::Node comparisons. None of which are that cheap.
> 

I thought so too at first, but figured I'd give it a shot anyways as I
misjudged the cost of certain operations before (or underestimated my
cpu). Fact is I don't even see the activity of the nepomukfeeder process
while indexing a bunch of mails, and virtuoso still maxes out a core.
This might be different for weaker cpu's but I think the hash
calculation is just not that expensive.

If somebody on an older machine could give it a shot that would be nice.

It's easy to test using the nepomukupimindexerutility in
kdepim-runtime/agents/nepomukfeeder/util:
* try master first:
* Select a bunch of messages
* right-click and select re-index
* look at the time it took
* checkout the branch with the cache
* retry the same set of messages
* watch the akonadi_nepomuk_feeder process for activity while indexing
* notice that only one third of the time was used to index

> I would ideally like to compare both the solutions, but that would
> involve
> coding both of them.
> 

Unless there are actual problems with the current implementation you
can't trick me into that ;-)

> 
> > Cheers,
> > Christian
> >
> > On Tue, Dec 4, 2012, at 05:54 PM, Vishesh Handa wrote:
> > > On Tue, Dec 4, 2012 at 9:52 PM, Christian Mollekopf
> > > <chrigi_1 at fastmail.fm>wrote:
> > >
> > > >
> > > >
> > > >
> > > > >
> > > > > * The hashing function simply xor's the conjunction of the hashes of
> > uri
> > > > > and
> > > > > value of each resource-property together to calculate the hash of the
> > > > > full
> > > > > resource.  I don't know the math behind that but simply copied it
> > from
> > > > > nepomuk-core, input would be appreciated:
> > > > > propertycache.cpp:50
> > > > >
> > > >
> > > > The hashing function looks like this:
> > > >
> > > >     uint hash = 0;
> > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > >     while( it.hasNext() ) {
> > > >         it.next();
> > > >         hash ^= qHash( it.key() ) & qHash( it.value().toString() );
> > > >     }
> > > >     return hash;
> > > >
> > >
> > > I haven't looked at the code properly, but I'm surprised that you're
> > > hashing the entire SimpleResource, cause that kind misses the point of
> > > having application specific caches on the client side.
> > >
> > > I was hoping you would have separate separate caches for each of these.
> > >
> > > * nco:EmailAddress
> > > * nco:Contact
> > > * nao:FreeDesktopIcon
> > > * nmo:MessageHeader
> > >
> > > for email, icon, and message header it will be a simple QHash<QString,
> > > QUrl>. You could convert the NepomukFeederUtils to a class (singleton,
> > > easier that way) and make it keep 4 separate caches. It could have
> > > similar
> > > functions to the one in the NepomukFeederUtils namespace, but they would
> > > use the cache.
> > >
> > > You could also move addGraphToNepomuk from nepomukhelpers.cpp to it.
> > > Something like this -
> > >
> > > class NepomukFeederUtils {
> > > public:
> > >     void setIcon(const QString& iconName, Nepomuk2::SimpleResource& res,
> > > Nepomuk2::SimpleResourceGraph& graph) {
> > >         if( m_iconCache.contains(iconName) ) {
> > >             res.addProperty( NAO::prefSymbol(),
> > >             m_iconCache.value(iconName)
> > > );
> > >         }
> > >         else {
> > >             SimpleResource icon;
> > >             icon.addType( NAO::FreeDesktopIcon() );
> > >             icon.setProperty( NAO::iconName, iconName );
> > >             res.addProperty( NAO::prefSymbol(), icon );
> > >             graph << icon;
> > >
> > >             m_tempIconCache.insert( iconName, icon.uri() ); // The icon
> > >             uri
> > > will be of the form _:adfasd
> > >         }
> > >     }
> > >
> > >     KJob* addGraphtoNepomuk( const SimpleResourceGraph & graph ) {
> > >         KJob* job = graph.save();
> > >         connect( job, SIGNAL(finished(KJob*)), this,
> > > SLOT(slotJobSaved(KJob*)) );
> > >         return job;
> > >     }
> > >
> > >     void slotJobSaved(KJob* job_) {
> > >         StoreResourcessJob* job = static_cast<StoreResourcessJob*>(job_);
> > >
> > >         QHashIterator<QUrl, QUrl> iter( job->mappings() );
> > >         while( iter.hasNext() ) {
> > >             m_iconCache.insert( m_tempIconCache[iter.key()], iter.value()
> > >             );
> > >         }
> > >         m_tempIconCache.clear();
> > >     }
> > >
> > > private:
> > >     QHash<QString, QUrl> m_iconCache;
> > >     QHash<QString, QUrl> m_tempIconCache;
> > > }
> > >
> > >
> > > uhh. Maybe I should look at the code. Maybe you would want to combine
> > > both
> > > the m_iconCache and m_tempIconCache into one, but then you can only run
> > > one
> > > job at a time.
> > >
> > >
> > > > The (hash ^= newhash) should be ok, but I don't think the conjunction
> > > > is, and I think it would make more sense to XOR that as well. That the
> > > > hash is initialized with 0 doesn't hurt IMO as there is a 50/50 chance
> > > > we for 0 or 1 when using an XOR.
> > > >
> > > > So I'll change this to:
> > > >
> > > >     uint hash = 0;
> > > >     QHashIterator<QUrl, QVariant> it( properties );
> > > >     while( it.hasNext() ) {
> > > >         it.next();
> > > >         hash ^= qHash( it.key() ) ^ qHash( it.value().toString() );
> > > >     }
> > > >     return hash;
> > > >
> > > > Note that XOR is commutative, so the hash does not preserve the
> > > > information of order, but that should be fine for our usecase.
> > > > _______________________________________________
> > > > KDE PIM mailing list kde-pim at kde.org
> > > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > > KDE PIM home page at http://pim.kde.org/
> > > >
> > >
> > >
> > >
> > > --
> > > Vishesh Handa
> > > _______________________________________________
> > > KDE PIM mailing list kde-pim at kde.org
> > > https://mail.kde.org/mailman/listinfo/kde-pim
> > > KDE PIM home page at http://pim.kde.org/
> > _______________________________________________
> > KDE PIM mailing list kde-pim at kde.org
> > https://mail.kde.org/mailman/listinfo/kde-pim
> > KDE PIM home page at http://pim.kde.org/
> >
> 
> 
> 
> -- 
> Vishesh Handa
> _______________________________________________
> KDE PIM mailing list kde-pim at kde.org
> https://mail.kde.org/mailman/listinfo/kde-pim
> KDE PIM home page at http://pim.kde.org/
_______________________________________________
KDE PIM mailing list kde-pim at kde.org
https://mail.kde.org/mailman/listinfo/kde-pim
KDE PIM home page at http://pim.kde.org/