[Nepomuk] Strigi Feeder

Wed Jul 14 13:30:56 CEST 2010

Forgot to do "Reply to All".

On Wed, Jul 14, 2010 at 4:59 PM, Vishesh Handa <handa.vish at gmail.com> wrote:

> Hey Sebastian
>
> On Wed, Jul 14, 2010 at 2:59 PM, Sebastian Trüg <trueg at kde.org> wrote:
>
>> Hi Vishesh,
>>
>> great work. Some comments:
>>
>> On 07/13/2010 10:21 PM, Vishesh Handa wrote:
>> > The current patch checks for blank Nodes in the object / subject of the
>> > file's metadata, and tries to find them or creates them if not present.
>> > The patch reverts to a the original behavior if any of the additionally
>> > generated metadata ( contacts, albums) contain any blank nodes. in order
>> > to fix this, a full blown dependency resolution algorithm would be
>> > required. I don't think that it is currently required.
>>
>> Could you elaborate on this please. I do not follow exactly. The problem
>> would be if there was for example a contact that was related to another
>> contact? If so, I doubt any strigi analyzer does create such data atm.
>>
>>  Yes, the problem would arise if one contact is related to another. AFAIK
> no strigi analyzer creates such data, and even if one of them does create
> such data. It would land up creating the resources which is what the earlier
> IndexWriter did.
>
>
>> > The patch also creates a different graph ( discardable ) for each
>> > individual resource.
>>
>> very good.
>>
>> > *Problem not fixed :*
>> > This will only work on newly indexed files and does not affect the files
>> > which have already been indexed. We'll need some kind of merger to do
>> > that. It's a lot simpler to just re-index the files, but I don't think
>> > the end users would like that.
>>
>> I am still for reindexing files based on mimetype whenever there are new
>> features available for that mimetype.
>>
>
> Same here, but I doubt the users would be. I was trying to implement
> resource merging yesterday and I don't think it is difficult, the only
> problem is that I couldn't find any fast way to check if resources need to
> be merged. ( Maybe in 1-2 queries, any idea? ) The actual merging would be
> somewhat slow, but that's okay. It would be done in a separate thread and
> would only be done once or twice.
>
>
>>
>> > *A New Problem :
>> > *Since the additional metadata now has it's own graph. It will not be
>> > deleted if the file is deleted. We need some kind of cleaner which
>> > cleans resources which are no longer in use.
>>
>> I see 2 solutions:
>> 1. Do it in a cleaner service the same way we clean up unused metadata
>> graphs and the like.
>> 2. Do it when deleting the files' metadata in the strigi service and in
>> the filewatch service using shared code.
>>
>
> The second option would be easier, but I'm not too keen on slowing down the
> metadatamover service or the strigi service. They have enough to deal with
> as it is. For now, we can just let the additional resources be.
>
>>
>>
>> And now the code review:
>> 1. Please add comments on all methods as if it were public API. While
>> this is not required and I do not do it all the time it will simplify
>> working on the code together.
>>
>
> Okay
>
>
>> 2. the variable names are not very descriptive: m_queue and m_stack. :P
>>
>
> I've renamed m_queue to m_updateQueue, but I'm not sure what I should do
> with m_stack. I could maybe call it feeder stack? Do you have suggestions? I
> have however, add comments that say what the variable does.
>
>
>> 3. I think we could do most of generateGraph using Soprano::NRLModel
>> 4. createNode() is not descriptive enough. It is not clear what it does
>> when you see it used in code. Better call it something like
>> blankNodeFromIdentifier or something like that.
>>
>
> I've renamed it to createBlankOrResourceNode() as that is what it does.
>
>
>> 5. the feeder is started twice (from a conceptual point of view): once
>> after constructing it and once in startAnalysis. IMHO you could simply
>> start the feeder in its constructor and thus, hide the threading issue
>> away more.
>>
>
> Done. :)
>
>
>> 6. "// Handle the feeder" is not the greatest comment as it only
>> clutters the code without adding any new information. The fact that
>> something is being done with the feeder is obvious. :)
>>
>
> Uhh. Yea, stupid comment. I've removed it for now.
>
>
>> 7. the name StrigiFeeder is a bit misleading. The class does not feed
>> anything into Strigi. It does the exact opposite. So maybe NepomukFeeder
>> would be better... i dont know.
>>
>
> I found the name NepomukFeeder rather generic. I've named it to
> NepomukIndexFeeder for now. :-/
>
>
>> 8. You did it again: a static method named "toSparql". Please do not do
>> that. :)
>>
>
> But this time I added documentation as to what kind of query it creates. I
> understand that these kind of function names are bad, but I can't think of
> any alternative. Could you please suggest some name?
>
>
>> 9. Why don't you store the set of statements as ResourceHash right away?
>>
>> Yes. I thought about this for quite some time. I actually made a list of
> pros and cons. In the end I decided to go with this approach as it required
> less modification of code ( less error prone ) and is technically more
> portable. Plus, ResourceStruct is a lousy name and I didn't really want to
> make the IndexWriter depend on it.
>
> But I think we should think about standardizing ResourceStruct. I use it a
> lot in BackupSync and I use it in the API Artem wants ( which is difficult
> to design ). Plus it's really convenient!
>
>
>> As you can see I have no "real" comments since IMHO you did a great job.
>> So please go ahead and commit that (maybe with some changes based on my
>> comments) to trunk. Then testing can commence. :)
>>
>
> Are you sure? Just say "Yes' and I'll commit it.
>
>>
>> Cheers,
>> Sebastian
>> _______________________________________________
>> Nepomuk mailing list
>> Nepomuk at kde.org
>> https://mail.kde.org/mailman/listinfo/nepomuk
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20100714/c6d9e640/attachment.htm