[Nepomuk] Refactoring for Soprano 3

Thu Oct 22 16:46:30 CEST 2009

Hi,

On Tue, Oct 20, 2009 at 4:18 PM, Sebastian Trüg <trueg at kde.org> wrote:
>> RepositoryFactory must be used to create any repository, may it be a
>> StorageRepository or a ProxyRepository .
>
> No, that does not make sense. The only reason for the factory is the plugin-
> system. Any other class can be instanciated directly.

I understand that. Currently the factory might be planned to just
instantiate a StorageRepository from a list of repositorie plugins
that handle storage differently. But, in future, there might be a
scenario where we may need to instantiate a ProxyRepository that does
different functionality. Like in the example you gave to "Imagine you
simply want to drop some statements in addStatement"; there must be
some other cases of plugins to handle different things. So, why not
consider this possibility now? When you start using plugins, mostly
they are taken in directions which we had not originally imagined... I
just wanted to be ready for such a scenario ;)

>> A ProxyRepository creates a ProxyConnection.
>
> What is the advantage of this? The way I see it ProxyConnection is a
> convenience class that makes it simpler to implement proxies. Imagine you
> simply want to drop some statements in addStatement. Then there is no need to
> reimplement all the other methods. They can simply be forwarded.
> If you, however, need to do something really fancy where you need to
> reimplement all of them there is no real need for a proxconnection. There
> might even be some other RespoitoryConnection subclass that is better suited.

ProxyConnection is just the renamed FilterRepositoryConnection; there
is nothing special about it  - sorry i forgot to mention it ;)
But, i think you meant ProxyRespoitory in the above example..i thought
it was intended to process statements.  Or, can a connection start
dropping statements ?

>> For inferencing, i think that EnableInference should be member of
>> RepositoryConnection, instead of adding it for each query. It is
>> ideally set per connection and can always be disabled by the setting
>> the member temporarily
>
> I thought having flags in the query and the listStatements methods would allow
> to have more options in there. But I have to admit that ATM I would not know
> any flags other than inference.

Just for the exactly same scenario, i wanted the flags in
RepositoryConnection. When we add new flags later, we need not go
changing every function ;)

>> As inference is planned to be integrated deep, i think
>> Soprano::Statement should be changed to incorporate 2 things
>> 1. inferred flag - whether the statement has been inferred or is new
>
> This is something Leo suggested, too. I think it is a good idea. I am simply
> not sure if any backend can pull that off.
>
>> 2. referenceStatements - for inferred statements , this list contains
>> all the parent statement from which the inference was reached. For
>> others , this is null
>
> Again a very nice idea but here I am sure that nothing does support it. And
> the question is: does it make sense to already add it although there will be
> no support for it by any backend?

not sure about how to pull it off... it would be great to have when
nepomuk starts getting smart and starts predicting choices based on
inferences.
But, maybe, we can just think about it for some more time ;)

>> * writing to the repository should not invalidate open readers and
>> queries. It sucks hard when RDF apis suddenly "fail" in a half-iterated
>> query result when you concurrently do changes. An iterator should be
>> valid from start to end, despite doing changes to the same
>> repositoryconnection while the iterator is open. If this is not
>> supported, it pushes a lot of hassle to the individual developer
>
> I think writes should not be possible on a connection with an open
> iterator/cursor. If an application needs to read and write at the same time it
> opens two connections, right?

Makes sense in a way...maybe we can add a mode (read/write) to the
createConnection() method in Repository class. So, that the connection
can know what it is used for and can optimise things internally (like
controlling the record locking, etc)

Cheers,
Hari