[Nepomuk] Strigi Feeder

Wed Jul 14 11:29:35 CEST 2010

Hi Vishesh,

great work. Some comments:

On 07/13/2010 10:21 PM, Vishesh Handa wrote:
> The current patch checks for blank Nodes in the object / subject of the
> file's metadata, and tries to find them or creates them if not present.
> The patch reverts to a the original behavior if any of the additionally
> generated metadata ( contacts, albums) contain any blank nodes. in order
> to fix this, a full blown dependency resolution algorithm would be
> required. I don't think that it is currently required.

Could you elaborate on this please. I do not follow exactly. The problem
would be if there was for example a contact that was related to another
contact? If so, I doubt any strigi analyzer does create such data atm.

> The patch also creates a different graph ( discardable ) for each
> individual resource. 

very good.

> *Problem not fixed :*
> This will only work on newly indexed files and does not affect the files
> which have already been indexed. We'll need some kind of merger to do
> that. It's a lot simpler to just re-index the files, but I don't think
> the end users would like that.

I am still for reindexing files based on mimetype whenever there are new
features available for that mimetype.

> *A New Problem :
> *Since the additional metadata now has it's own graph. It will not be
> deleted if the file is deleted. We need some kind of cleaner which
> cleans resources which are no longer in use.

I see 2 solutions:
1. Do it in a cleaner service the same way we clean up unused metadata
graphs and the like.
2. Do it when deleting the files' metadata in the strigi service and in
the filewatch service using shared code.

And now the code review:
1. Please add comments on all methods as if it were public API. While
this is not required and I do not do it all the time it will simplify
working on the code together.
2. the variable names are not very descriptive: m_queue and m_stack. :P
3. I think we could do most of generateGraph using Soprano::NRLModel
4. createNode() is not descriptive enough. It is not clear what it does
when you see it used in code. Better call it something like
blankNodeFromIdentifier or something like that.
5. the feeder is started twice (from a conceptual point of view): once
after constructing it and once in startAnalysis. IMHO you could simply
start the feeder in its constructor and thus, hide the threading issue
away more.
6. "// Handle the feeder" is not the greatest comment as it only
clutters the code without adding any new information. The fact that
something is being done with the feeder is obvious. :)
7. the name StrigiFeeder is a bit misleading. The class does not feed
anything into Strigi. It does the exact opposite. So maybe NepomukFeeder
would be better... i dont know.
8. You did it again: a static method named "toSparql". Please do not do
that. :)
9. Why don't you store the set of statements as ResourceHash right away?

As you can see I have no "real" comments since IMHO you did a great job.
So please go ahead and commit that (maybe with some changes based on my
comments) to trunk. Then testing can commence. :)

Cheers,
Sebastian