Permission to break feature freeze for Nepomuk and Soprano

Mon Sep 3 17:49:13 BST 2007

On Monday 03 September 2007 18:53:27 Richard Dale wrote:
> On Monday 03 September 2007, Sebastian Trüg wrote:
> > Hi guys,
> >
> > there has been a long silence around Nepomuk. Well, the main reason is
> > that I was working heaviliy on Soprano2 [1]. It comes with a bunch of new
> > features, a much cleaner API, a server/client architecture (quite simple
> > but waaaay faster than DBus), and I intend to replace the Nepomuk
> > middleware with it. I already did it locally and a commit "should" not
> > break anything once Soprano2 is copied to kdesupport.
> >
> > The main advantage of all this is: speed. No more DBus for requesting
> > data. It is now done via tcp (and I would love some tips and help to add
> > support for unix socket communication)
> >
> > If there are no objections until this evening I will proceed (I know it
> > is short notice but I don't think it is such a big deal yet.)
> >
> > Cheers,
> > Sebastian
> >
> > [1] branches/work/sorano2
>
> I don't know about speed, but from what I've seen of the SPARQL query api
> it only returns results as C++ over dbus (as a
> Nepomuk::RDF::QueryResultTable). The query itself is a text string which is
> language independent and fine. So the results are retrieved as xml text,
> marshalled to c++, then marshalled to the dbus wire format. At the other
> end, the are put back into the original c++ classes. In ruby there would be
> a further step where those c++ instances would be wrapped with Ruby
> classes.
>
> What I would prefer is that the query results are sent as a an xml
> string 'application/sparql-results+xml' over dbus and then the client can
> choose what it wants to do with it, depending on what language the client
> is written in. For instance, a C++ client could convert the xml to
> Nepomuk::RDF::QueryResultTable, or a ruby client could convert to the
> ActiveRDF result format. Maybe there is a problem because Nepomuk needs to
> return quads when named graphs are used, instead of just triples - I'm not
> enough of an expert to know if using sparql/xml for the result sets would
> cause problems with that.

You underestimate the performance overhead of conversions to and from XML + 
DBUS overhead. In certain cases like PIM using nepomuk with all this overhead 
would be as sane as accessing the filesystem via DBUS and XML.

Retrieving metadata for a single file is not a problem with either approaches, 
but in heavy-load conditions it can be(and if there aren't any heavy usage 
scenarios, why bother with unneded stuff at all?)

How significant performance penalty we are talking about? I didn't do 
benchmarks. Sebastian says it makes a big difference. So far I personally 
don't have any reason not to trust him on this.

Still would be nice to see the actual numbers.

> I am keen to be able to use jabber XMPP as a transport for peer to peer
> SPARQL queries between KDE users if they wanted to exchange their own
> meta-data in their local triple stores, such as restaurant recommendations
> or whatever. So I think Nepomuk should be as transport and language
> independent as possible, and using xml as the result set format makes that
> easier.

A valid point. Quite likely dropping the XML interface completely is not ok, 
but making it the primary one seems too unrealistic.

The most typical and performance-demanding use case will be c++ client 
interacting with nepomuk with native interface and tcp link will likely 
greatly improve this. It's easy to add a XML serialization on top of c++ 
classes if anyone needs it.

As to RDF backend already returning RDF/XML, it's not necessarily so.

-- Evgeny