[Nepomuk] Duplicates merging in DataManagementModel::storeResources

Christian Mollekopf chrigi_1 at fastmail.fm
Sat Oct 8 13:12:51 UTC 2011


Hi Vishesh,

The duplicates merging code doesn't cut it for the feeders yet.
As far as I could track it down the problem is that I have hierarchies of 
resources which need to be merged together.

I.e. I add a contact with it's email address several times to the graph. The 
email addresses are now correctly merged, but because the contacts had 
different email uris in the first hashing run (before they have been merged), 
the contacts remain duplicated.

Here is the test which currently fails:
http://paste.kde.org/131371/

And here's an excerpt of the debugging output which shows the problem in the 
actual feeders:
http://paste.kde.org/131377/

As I understand your code you generate a hash of each resource to check if two 
are exactly the same. That probably works for most use-cases, but I'm not sure 
if it is the best solution.
Given the problem above you'd have to rerun the hashing for the resources 
which were modified due to a merged resource, so that already complicates 
matters.

I thought maybe it would be possible to leave the merging up to the normal 
resource merger. This would have the effect that not only exactly equal 
resources would be merged, but all, just as the resource merger would normally 
merge them.
If you think of the SimpleResourceGraph as a tree, a post-order traversal of 
the tree would allow you to store each resource one by one, starting from the 
leaves of the branch going to the root. The ResourceMerger would then 
automatically merge all resources as necessary.

Do you think that would be a viable option?

Cheers,
Christian



More information about the Nepomuk mailing list