Permission to break feature freeze for Nepomuk and Soprano
Evgeny Egorochkin
phreedom.stdin at gmail.com
Mon Sep 3 17:49:13 BST 2007
On Monday 03 September 2007 18:53:27 Richard Dale wrote:
> On Monday 03 September 2007, Sebastian TrĂ¼g wrote:
> > Hi guys,
> >
> > there has been a long silence around Nepomuk. Well, the main reason is
> > that I was working heaviliy on Soprano2 [1]. It comes with a bunch of new
> > features, a much cleaner API, a server/client architecture (quite simple
> > but waaaay faster than DBus), and I intend to replace the Nepomuk
> > middleware with it. I already did it locally and a commit "should" not
> > break anything once Soprano2 is copied to kdesupport.
> >
> > The main advantage of all this is: speed. No more DBus for requesting
> > data. It is now done via tcp (and I would love some tips and help to add
> > support for unix socket communication)
> >
> > If there are no objections until this evening I will proceed (I know it
> > is short notice but I don't think it is such a big deal yet.)
> >
> > Cheers,
> > Sebastian
> >
> > [1] branches/work/sorano2
>
> I don't know about speed, but from what I've seen of the SPARQL query api
> it only returns results as C++ over dbus (as a
> Nepomuk::RDF::QueryResultTable). The query itself is a text string which is
> language independent and fine. So the results are retrieved as xml text,
> marshalled to c++, then marshalled to the dbus wire format. At the other
> end, the are put back into the original c++ classes. In ruby there would be
> a further step where those c++ instances would be wrapped with Ruby
> classes.
>
> What I would prefer is that the query results are sent as a an xml
> string 'application/sparql-results+xml' over dbus and then the client can
> choose what it wants to do with it, depending on what language the client
> is written in. For instance, a C++ client could convert the xml to
> Nepomuk::RDF::QueryResultTable, or a ruby client could convert to the
> ActiveRDF result format. Maybe there is a problem because Nepomuk needs to
> return quads when named graphs are used, instead of just triples - I'm not
> enough of an expert to know if using sparql/xml for the result sets would
> cause problems with that.
You underestimate the performance overhead of conversions to and from XML +
DBUS overhead. In certain cases like PIM using nepomuk with all this overhead
would be as sane as accessing the filesystem via DBUS and XML.
Retrieving metadata for a single file is not a problem with either approaches,
but in heavy-load conditions it can be(and if there aren't any heavy usage
scenarios, why bother with unneded stuff at all?)
How significant performance penalty we are talking about? I didn't do
benchmarks. Sebastian says it makes a big difference. So far I personally
don't have any reason not to trust him on this.
Still would be nice to see the actual numbers.
> I am keen to be able to use jabber XMPP as a transport for peer to peer
> SPARQL queries between KDE users if they wanted to exchange their own
> meta-data in their local triple stores, such as restaurant recommendations
> or whatever. So I think Nepomuk should be as transport and language
> independent as possible, and using xml as the result set format makes that
> easier.
A valid point. Quite likely dropping the XML interface completely is not ok,
but making it the primary one seems too unrealistic.
The most typical and performance-demanding use case will be c++ client
interacting with nepomuk with native interface and tcp link will likely
greatly improve this. It's easy to add a XML serialization on top of c++
classes if anyone needs it.
As to RDF backend already returning RDF/XML, it's not necessarily so.
-- Evgeny
More information about the kde-core-devel
mailing list