[Nepomuk] Review Request: StoreResources: Add a flag to force duplicate detection in the graph

Vishesh Handa me at vhanda.in
Mon Oct 8 16:45:57 UTC 2012



> On Oct. 8, 2012, 3:24 p.m., Christian Mollekopf wrote:
> > That's fine from the PIM side, but I'd still be interested where you'd want to avoid the duplicates merging. It seems like a crucial feature to me as soon as we have multiple applications operating on the same data, where we can't know which data is already present in the store and which isn't.
> > So it might make sense to change the semantics so you can disable the duplicates merging and have it on by default, as it seems to me more like a performance optimization for cases where we know that no duplicates are existing.
> > Otherwise we could render the whole database pretty quickly useless by creating a massive amount of duplicates.
> > 
> > Or am I just misunderstanding something?

Uhh. No.

The duplicate merging is only for the data that is not already present in Nepomuk. Basically duplicats in the SimpleResourceGraph that you provided. Example -

_:a a nao:Tag ;
    nao:identifier "Tag1" .

_:b a nao:Tag ;
    nao:identifier "Tag1" .

_:c a nfo:FileDataObject ;
    nao:hasTag _:a, _:b .

Case 1 : When Tag1 does not already exist + Flag off - In that case _:c will have 2 tags attached to it both of which have the same identifier but have different resource uris.

Case 2 : When Tag1 does not already exists + Flag on - In that cause the SimpleResourceGraph will be checked for duplicates, and _:a and _:b would have been found to be identical. So they would have been merged together into _:a. _:c would only contain 1 tag then. This is a pre-processing stuff. After this the entire normal identification process would run to determine if a tag with identifier "Tag1" already exists.

Case 2: When Tag1 does exist + Flag off - Both _:a and _:b will be identified to <nepomuk:/res/tag1-uri> and _:c will only have 1 tag

Does this make it clear?


- Vishesh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://git.reviewboard.kde.org/r/106711/#review20081
-----------------------------------------------------------


On Oct. 3, 2012, 12:11 p.m., Vishesh Handa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://git.reviewboard.kde.org/r/106711/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2012, 12:11 p.m.)
> 
> 
> Review request for Nepomuk, Christian Mollekopf and Sebastian Trueg.
> 
> 
> Description
> -------
> 
>     StoreResources: Add a flag to force duplicate detection in the graph
>     
>     By default each SimpleResource in the graph was always hash (an
>     expensive process) and then checked for duplicates with the other
>     SimpleResources in the graph.
>     
>     This feature was only added cause the PIM guys were pushing large
>     quantities of duplicate data. It doesn't make sense for everyone to pay
>     the penalty for one application.
>     
>     They can enable this feature with the MergeDuplicateResources flag.
> 
> 
> Diffs
> -----
> 
>   libnepomukcore/datamanagement/datamanagement.h 2ac60a5 
>   services/storage/datamanagementmodel.cpp 7c05cfd 
>   services/storage/test/datamanagementmodeltest.cpp 3d3340c 
> 
> Diff: http://git.reviewboard.kde.org/r/106711/diff/
> 
> 
> Testing
> -------
> 
> Updated the relevant tests
> 
> 
> Thanks,
> 
> Vishesh Handa
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/nepomuk/attachments/20121008/6eb586bc/attachment.html>


More information about the Nepomuk mailing list