[Nepomuk] [Soprano-devel] Refactoring for Soprano 3

Tue Oct 20 21:52:34 CEST 2009

Sebastian Trüg wrote:
> On Friday 16 October 2009 00:32:01 Greg Beauchesne wrote:
>> Hmm... I can see how that would be useful if you have a lot of
>> listeners, although I think most of my listeners can reject statements
>> they don't care about pretty quickly. Maybe something taking your
>> statements pattern thing a little further, wherein the system allows
>> models to report partial update notifications (e.g. "A statement or
>> statements with the obj:blah subject were added, but that's all the
>> information I have"), as opposed to the all-or-nothing reporting that
>> currently exists.
> 
> So you mean keep the current signal-based system but make it simpler for the 
> backends?

Maybe, though not necessarily. Any system completely based on Qt's
signals and slots ends up being "one size fits all" with regard to what
information is reported back. But I do think the patterns are important
either way.

>> >From what I gather, Nepomuk deals with a lot of persistent/cached data,
>>
>> right? I guess that would make sense that you then have a persistent URI
>> to refer back to.
> 
> there are still some instances that would make sense to be represented by 
> blank nodes. An example are address instances related to contacts.
> Maybe if we improve the blank node situation in Soprano3 we can also think 
> about dropping that restriction.

Fair enough.

>> 3. Pooled: NO, private data: YES - Node data may or may not be
>> immediately realized. This type is the fastest when used with single
>> Models. When Model boundaries are crossed, the optimization data is
>> lost/ignored, and the Node is just treated like #1. Client code does not
>> create this directly, but if it knows the target Model in advance, it
>> could ask the Model to attach its private data.
> 
> I don't really get the last sentence. Why and how would a client ask a Model 
> to attach its internal data. I thought that a backend like redland would 
> simply always use its internal node representation for lazy conversion of the 
> node data.

Much like the client code pre-pooling an otherwise constant Node for use
with multiple models, it could be more efficient to pre-optimize a Node
for a series of queries on a particular Model.

Example:

Node myNode(QUrl("http://this.uri/gets/reused"));
myNode = model->internalize(myNode);
Q_FOREACH (Node subj, subjList) {
    StatementIterator it = model->listStatements(subj, myNode, Node());
    ...
}

Without internalize() above, the "http://this.uri/gets/reused" has to be
 reprocessed for every call to listStatements().

> The way I see it there is only one way to get internal data into a node: using 
> a dedicated constructor or operator=.
> The only disadvantage of the model storing its internal data is a little more 
> memory being used.
> Although maybe some backends might need to keep track of all the nodes they 
> created. Since the model could be deleted before all the nodes leaving 
> dangling pointers. In theory. That would be an additional overhead that was 
> not necessary in many situations. Like in Nepomuk for example where pretty 
> much everything is done via queries and internal data thus has no advantage.
> But then there would have to be a configuration parameter or something which 
> could be used by a client to enable/disable the use of internal node data.
> And this would make sense one a generic Model/Repository level.

Well, again, the presence of private data doesn't necessarily mean the
core RDF Node data is unrealized (it could be, but it doesn't have to
be). In other words, lazy evaluation is not its sole purpose. Though for
blank nodes, it doesn't really matter -- if the Model goes away, its
blank nodes aren't much use anyway.

-- Greg