[Nepomuk] Refactoring for Soprano 3

Tue Oct 20 12:43:23 CEST 2009

On Thursday 15 October 2009 21:51:24 Leo Sauermann wrote:
> Hi Sebastian,
> 
> from my experience:
> - using sesame2 as inspiration is the right way to go. try to ignore
> Jena. Dunno about virtuoso.

There is nothing to get there. It is SQL.

> - use exactly the names you suggested, do use RepositoryFactory - this
> is correct, stick with it
> - your uml has a typo: the "Repositoy" is too good to be true.

:P

> what needs to be considered:
> * close/disconnect/dispose (however you put it). A repository connection
> must be explicitly closed.

Question is: do we actually need a close method or is it enough to simply 
delete the object. The only reason I could think of for a close method is to 
match an existing isOpen or isConnected method which is necessary.

> * dirty reads - the repositoryconnection must support dirty reads of its
> own changes in the listStatement AND SPARQL implementations - this is a
> major issue in nepomuk-java which I implemented in ClientSession and
> diffmodelset, and having dirty reads on your connection-aware changes
> with SPARQL support is the best way to go, although the hardest to
> achieve. It allows to build nice guis showing the results "as if
> commited" while not having commited yet. Note, that dirty reads are very
> unlikely to work for text indexing.

I think this is up to the backend. Some backends might not support that.

> * thread-safety: sesame2 leaves a lot to be desired here. My wish is:
> many many threads realized by multiple RepositoryConnections. Read/write
> must be possible for multiple connectsions without blocking. For one
> RepositoryConnection, single-thread safety is a good tradeoff.

That is the idea. With Viruoso the only problem seems to be the number of 
server threads which is limited. Maybe we need to catch that in the Soprano 
server.

> * writing to the repository should not invalidate open readers and
> queries. It sucks hard when RDF apis suddenly "fail" in a half-iterated
> query result when you concurrently do changes. An iterator should be
> valid from start to end, despite doing changes to the same
> repositoryconnection while the iterator is open. If this is not
> supported, it pushes a lot of hassle to the individual developer

I think writes should not be possible on a connection with an open 
iterator/cursor. If an application needs to read and write at the same time it 
opens two connections, right?

> * listening to changes based on configurable filters - I interpret the
> "statementAdded" and "statementRemoved" methods as triggering events, I
> implemented something like "addStatementListener(listener, filter)". To
> push the filtering/notification optimisation to the store

Yes, the simple signals are probably not the best solution performance-wise.
The problem is: if we use a pattern as you propose that would mean that every 
backend would have to implement this notification filtering.
That is why I thought of this two-step solution: have the simple signals in 
the backends and handle the filtering on a higher level, but still in only one 
place which clients can register to.

> * I would remove all these super/sub interfaces (Filter*) and just have
> one interface of each part, let the implementation throw excpeitons if
> it doesn't undestand each method, nobody will bother checking the
> interfaces anyway, its a hassle

We have no exceptions. And the Filter* stuff makes a lot of sense as it 
provides default implementations for all methods. That way one can write a 
simple filter and only concentrate on the one thing they want to do. Otherwise 
one would have to reimplement the connections creation for every filter class.
BTW: The filter thing is intended to be something like the SAILs in Sesame.

> * Desire: Inference must mark inferred statements as inferred. Thus, the
> statement interface may add another bit "inferred". Even better would be
> to have an explanation attached to each inferred statement leading from
> the inferred statement to the original statement, this is very important
> for GUIs and for editing data. something such as
> Statement.getInferenceCause - returns some human-readable and
> machine-understandable explanation pointing to 1..n source statements
> and the number of the RDFS/NRL rule that was executed to come from
> source to result. (something like this... but this is science fiction)

That is an interesting idea. It would have to be an optional thing of course 
since not all backends support inference. But it could be doable.

> so, thats all I have to say. For each point:  it will make things
> better, and will cause trouble if not done right.
> I have no time for discussions, please take the input as is.

:P