[Nepomuk] Refactoring for Soprano 3

Thu Oct 15 21:51:24 CEST 2009

Hi Sebastian,

from my experience:
- using sesame2 as inspiration is the right way to go. try to ignore
Jena. Dunno about virtuoso.
- use exactly the names you suggested, do use RepositoryFactory - this
is correct, stick with it
- your uml has a typo: the "Repositoy" is too good to be true.

what needs to be considered:
* close/disconnect/dispose (however you put it). A repository connection
must be explicitly closed.
* dirty reads - the repositoryconnection must support dirty reads of its
own changes in the listStatement AND SPARQL implementations - this is a
major issue in nepomuk-java which I implemented in ClientSession and
diffmodelset, and having dirty reads on your connection-aware changes
with SPARQL support is the best way to go, although the hardest to
achieve. It allows to build nice guis showing the results "as if
commited" while not having commited yet. Note, that dirty reads are very
unlikely to work for text indexing.
* thread-safety: sesame2 leaves a lot to be desired here. My wish is:
many many threads realized by multiple RepositoryConnections. Read/write
must be possible for multiple connectsions without blocking. For one
RepositoryConnection, single-thread safety is a good tradeoff.
* writing to the repository should not invalidate open readers and
queries. It sucks hard when RDF apis suddenly "fail" in a half-iterated
query result when you concurrently do changes. An iterator should be
valid from start to end, despite doing changes to the same
repositoryconnection while the iterator is open. If this is not
supported, it pushes a lot of hassle to the individual developer
* listening to changes based on configurable filters - I interpret the
"statementAdded" and "statementRemoved" methods as triggering events, I
implemented something like "addStatementListener(listener, filter)". To
push the filtering/notification optimisation to the store
* I would remove all these super/sub interfaces (Filter*) and just have
one interface of each part, let the implementation throw excpeitons if
it doesn't undestand each method, nobody will bother checking the
interfaces anyway, its a hassle
* Desire: Inference must mark inferred statements as inferred. Thus, the
statement interface may add another bit "inferred". Even better would be
to have an explanation attached to each inferred statement leading from
the inferred statement to the original statement, this is very important
for GUIs and for editing data. something such as
Statement.getInferenceCause - returns some human-readable and
machine-understandable explanation pointing to 1..n source statements
and the number of the RDFS/NRL rule that was executed to come from
source to result. (something like this... but this is science fiction)

so, thats all I have to say. For each point:  it will make things
better, and will cause trouble if not done right.
I have no time for discussions, please take the input as is.

best
Leo

It was Sebastian Trüg who said at the right time 13.10.2009 16:57 the
following words:
> Hello,
>
> Soprano 2 IMHO has a nice design. Backend, Model, and FilterModel suited my 
> needs very well and were nice to use. But now it slowly reaches its limits.
>
> With Virtuoso the introduction of transactions become very important. I tried 
> to stack them on top of Model as you can see in the experimental branch[1] but 
> the resulting design is flawed and cannot be any different since we need to 
> stay binary compatible for KDE.
>
> Thus, it is time for a redesign and also time for you to state your needs and 
> ideas.
>
> Attached you find a preliminary design in the form of a crude UML diagram. It 
> is based on the sesame2 idea of Respository vs. RespositoryConnection.
>
> A few words of explanation:
> - Model is split into two classes Repository and RepositoryConnection.
>   Each RespositoryConnection represents its own transaction object, i.e.
>   can perform multiple transactions.
> - A Repository can create an arbitrary number of RespositoryConnection 
>   objects.
> - Backend is now RepositoryFactoy (better name welcome). It is used to 
>   create StorageRepositories which replace StorageModel and have settings
>   like storage dir and the like.
> - FilterRepository and FilterRepositoryConnection replace the current
>   FilterModel.
>   A FilterRepository would need to create its own connections which carry
>   as a member a parent connection created by the parent Repository.
>
> This is the basis I would like to start the discussion from.
> Another issue is inference which I would like to integrate deeper into 
> Soprano. The straight forward design would be to add a flags parameter to 
> method like query() and listStatements() which can for example be 
> EnableInference.
>
> Please discuss.
>
> Cheers,
> Sebastian
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>   

-- 
_____________________________________________________
Dr. Leo Sauermann       http://www.dfki.de/~sauermann 

Deutsches Forschungszentrum fuer 
Kuenstliche Intelligenz DFKI GmbH
Trippstadter Strasse 122
P.O. Box 2080           Fon:   +43 6991 gnowsis
D-67663 Kaiserslautern  Fax:   +49 631 20575-102
Germany                 Mail:  leo.sauermann at dfki.de

Geschaeftsfuehrung:
Prof.Dr.Dr.h.c.mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
_____________________________________________________