[Nepomuk] Duplicates merging in DataManagementModel::storeResources

Christian Mollekopf chrigi_1 at fastmail.fm
Mon Oct 31 11:42:53 UTC 2011


Hey,

This issue starts to get pressing, a solution is needed for 4.8.
Currently the feeders are broken because of that issue.

The code in storeResources is beyond me and my attempts to fix it failed so 
far. So if no one fixes it there I'll have to work around the issue in the 
feeder code.

I don't mean to push anyone, I'd just like to know if somebody from the 
nepomuk team (yes vishesh I'm looking at you ;-) is going to fix this, or if 
I'm on my own. As said, I do understand if you currently lack the time to make 
this happen, just tell me.

Thanks,
Christian

PS: I added the pastes before they are deleted from pastie

On Saturday, October 08, 2011 03:12:51 PM Christian Mollekopf wrote:
> Hi Vishesh,
> 
> The duplicates merging code doesn't cut it for the feeders yet.
> As far as I could track it down the problem is that I have hierarchies of
> resources which need to be merged together.
> I.e. I add a contact with it's email address several times to the graph. The
> email addresses are now correctly merged, but because the contacts had
> different email uris in the first hashing run (before they have been
> merged), the contacts remain duplicated.
> 
> Here is the test which currently fails:
> http://paste.kde.org/131371/

void DataManagementModelTest::testStoreResources_duplicates2()
{
    SimpleResource contact1;
    contact1.addType( NCO::Contact() );
    contact1.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
    contact1.addProperty( NAO::prefLabel(), QLatin1String("test") );
 
    SimpleResource email1;
    email1.addType(NCO::EmailAddress());
    email1.addProperty(NCO::emailAddress(), QLatin1String("email at foo.com"));
    contact1.addProperty(NCO::hasEmailAddress(), email1.uri());
 
    SimpleResource contact2;
    contact2.addType( NCO::Contact() );
    contact2.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
    contact2.addProperty( NAO::prefLabel(), QLatin1String("test") );
 
    SimpleResource email2;
    email2.addType(NCO::EmailAddress());
    email2.addProperty(NCO::emailAddress(), QLatin1String("email at foo.com"));
    contact2.addProperty(NCO::hasEmailAddress(), email2.uri());
 
    SimpleResourceGraph graph;
    graph << email1 << contact1 << email2 << contact2;
 
    m_dmModel->storeResources( graph, "appA" );
    QVERIFY(!m_dmModel->lastError());
 
    int contactCount = m_model->listStatements( Node(), RDF::type(), 
NCO::Contact() ).allStatements().size();
    QCOMPARE( contactCount, 1 );
 
    int emailCount = m_model->listStatements( Node(), RDF::type(), 
NCO::EmailAddress() ).allStatements().size();
    QCOMPARE( emailCount, 1 );
 
    QCOMPARE( m_model->listStatements( Node(), NCO::fullname(), Node() 
).allStatements().size(), 1 );
    QCOMPARE( m_model->listStatements( Node(), NAO::prefLabel(), Node() 
).allStatements().size(), 1 );
 
    QVERIFY(!haveTrailingGraphs());
}
 
add to qtest_dms.cpp:
 
    model.addStatement( NCO::emailAddress(), RDF::type(), RDF::Property(), 
graph );
    model.addStatement( NCO::emailAddress(), RDFS::range(), 
XMLSchema::string(), graph );
    model.addStatement( NCO::emailAddress(), RDFS::domain(), 
NCO::EmailAddress(), graph );
    
    model.addStatement( NCO::hasEmailAddress(), RDF::type(), RDF::Property(), 
graph );
    model.addStatement( NCO::hasEmailAddress(), RDFS::range(), 
NCO::EmailAddress(), graph );
    model.addStatement( NCO::hasEmailAddress(), RDFS::domain(), 
NCO::Contact(), graph );
    
    model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Resource(), 
graph );
    model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Class(), graph 
);
    model.addStatement( NCO::EmailAddress(), RDFS::subClassOf(), 
NCO::ContactMedium(), graph );

> 
> And here's an excerpt of the debugging output which shows the problem in the
> actual feeders:
> http://paste.kde.org/131377/
> 

nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:zre""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian 
Trueg""
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:zre""<http://www.w3.org/1999/02/22-rdf-syntax-
ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
 
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:gqe""<http://www.w3.org/1999/02/22-rdf-syntax-
ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#EmailAddress>"
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:gqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress>"""sebastian at trueg.de"^^<http://www.w3.org/2001/XMLSchema#string>"
 
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:fqe""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian 
Trueg""
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:fqe""<http://www.w3.org/1999/02/22-rdf-syntax-
ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: 
"_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
 
This is the error returned after the storeResourceCall:
nepomukstorage(21806)/nepomuk (storage service) 
Nepomuk::DataManagementModel::storeResources: Setting error! "Invalid argument 
(1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a 
max cardinality of 1. Provided 2 values - "Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing -  Affected 
Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old 
card: 0"
"/opt/devel/KDE/bin/nepomukservicestub(21806)" Soprano: "Invalid argument 
(1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a 
max cardinality of 1. Provided 2 values - "Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian 
Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing -  Affected 
Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old 
card: 0"

> As I understand your code you generate a hash of each resource to check if
> two are exactly the same. That probably works for most use-cases, but I'm
> not sure if it is the best solution.
> Given the problem above you'd have to rerun the hashing for the resources
> which were modified due to a merged resource, so that already complicates
> matters.
> 
> I thought maybe it would be possible to leave the merging up to the normal
> resource merger. This would have the effect that not only exactly equal
> resources would be merged, but all, just as the resource merger would
> normally merge them.
> If you think of the SimpleResourceGraph as a tree, a post-order traversal of
> the tree would allow you to store each resource one by one, starting from
> the leaves of the branch going to the root. The ResourceMerger would then
> automatically merge all resources as necessary.
> 
> Do you think that would be a viable option?
> 
> Cheers,
> Christian
> 
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk


More information about the Nepomuk mailing list