[Nepomuk] Duplicates merging in DataManagementModel::storeResources
Sebastian Trüg
trueg at kde.org
Mon Oct 31 11:12:35 UTC 2011
Hi Christian,
let's meet up this week to discuss the problem and hopefully fix it. So
far I stayed clean of the storeResources code but with Vishesh not
having much time I will dive into it.
Cheers,
Sebastian
On 10/31/2011 12:42 PM, Christian Mollekopf wrote:
> Hey,
>
> This issue starts to get pressing, a solution is needed for 4.8.
> Currently the feeders are broken because of that issue.
>
> The code in storeResources is beyond me and my attempts to fix it failed so
> far. So if no one fixes it there I'll have to work around the issue in the
> feeder code.
>
> I don't mean to push anyone, I'd just like to know if somebody from the
> nepomuk team (yes vishesh I'm looking at you ;-) is going to fix this, or if
> I'm on my own. As said, I do understand if you currently lack the time to make
> this happen, just tell me.
>
> Thanks,
> Christian
>
> PS: I added the pastes before they are deleted from pastie
>
> On Saturday, October 08, 2011 03:12:51 PM Christian Mollekopf wrote:
>> Hi Vishesh,
>>
>> The duplicates merging code doesn't cut it for the feeders yet.
>> As far as I could track it down the problem is that I have hierarchies of
>> resources which need to be merged together.
>> I.e. I add a contact with it's email address several times to the graph. The
>> email addresses are now correctly merged, but because the contacts had
>> different email uris in the first hashing run (before they have been
>> merged), the contacts remain duplicated.
>>
>> Here is the test which currently fails:
>> http://paste.kde.org/131371/
>
> void DataManagementModelTest::testStoreResources_duplicates2()
> {
> SimpleResource contact1;
> contact1.addType( NCO::Contact() );
> contact1.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact1.addProperty( NAO::prefLabel(), QLatin1String("test") );
>
> SimpleResource email1;
> email1.addType(NCO::EmailAddress());
> email1.addProperty(NCO::emailAddress(), QLatin1String("email at foo.com"));
> contact1.addProperty(NCO::hasEmailAddress(), email1.uri());
>
> SimpleResource contact2;
> contact2.addType( NCO::Contact() );
> contact2.addProperty( NCO::fullname(), QLatin1String("Spiderman") );
> contact2.addProperty( NAO::prefLabel(), QLatin1String("test") );
>
> SimpleResource email2;
> email2.addType(NCO::EmailAddress());
> email2.addProperty(NCO::emailAddress(), QLatin1String("email at foo.com"));
> contact2.addProperty(NCO::hasEmailAddress(), email2.uri());
>
> SimpleResourceGraph graph;
> graph << email1 << contact1 << email2 << contact2;
>
> m_dmModel->storeResources( graph, "appA" );
> QVERIFY(!m_dmModel->lastError());
>
> int contactCount = m_model->listStatements( Node(), RDF::type(),
> NCO::Contact() ).allStatements().size();
> QCOMPARE( contactCount, 1 );
>
> int emailCount = m_model->listStatements( Node(), RDF::type(),
> NCO::EmailAddress() ).allStatements().size();
> QCOMPARE( emailCount, 1 );
>
> QCOMPARE( m_model->listStatements( Node(), NCO::fullname(), Node()
> ).allStatements().size(), 1 );
> QCOMPARE( m_model->listStatements( Node(), NAO::prefLabel(), Node()
> ).allStatements().size(), 1 );
>
> QVERIFY(!haveTrailingGraphs());
> }
>
> add to qtest_dms.cpp:
>
> model.addStatement( NCO::emailAddress(), RDF::type(), RDF::Property(),
> graph );
> model.addStatement( NCO::emailAddress(), RDFS::range(),
> XMLSchema::string(), graph );
> model.addStatement( NCO::emailAddress(), RDFS::domain(),
> NCO::EmailAddress(), graph );
>
> model.addStatement( NCO::hasEmailAddress(), RDF::type(), RDF::Property(),
> graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::range(),
> NCO::EmailAddress(), graph );
> model.addStatement( NCO::hasEmailAddress(), RDFS::domain(),
> NCO::Contact(), graph );
>
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Resource(),
> graph );
> model.addStatement( NCO::EmailAddress(), RDF::type(), RDFS::Class(), graph
> );
> model.addStatement( NCO::EmailAddress(), RDFS::subClassOf(),
> NCO::ContactMedium(), graph );
>
>>
>> And here's an excerpt of the debugging output which shows the problem in the
>> actual feeders:
>> http://paste.kde.org/131377/
>>
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian
> Trueg""
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:zre""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:gqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#EmailAddress>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:gqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#emailAddress>"""sebastian at trueg.de"^^<http://www.w3.org/2001/XMLSchema#string>"
>
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/08/15/nao#prefLabel>"""Sebastian
> Trueg""
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.w3.org/1999/02/22-rdf-syntax-
> ns#type>""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#PersonContact>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname>"""Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>"
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources:
> "_:fqe""<http://www.semanticdesktop.org/ontologies/2007/03/22/nco#hasEmailAddress>""_:gqe"
>
> This is the error returned after the storeResourceCall:
> nepomukstorage(21806)/nepomuk (storage service)
> Nepomuk::DataManagementModel::storeResources: Setting error! "Invalid argument
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a
> max cardinality of 1. Provided 2 values - "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing - Affected
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old
> card: 0"
> "/opt/devel/KDE/bin/nepomukservicestub(21806)" Soprano: "Invalid argument
> (1)": "http://www.semanticdesktop.org/ontologies/2007/03/22/nco#fullname has a
> max cardinality of 1. Provided 2 values - "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>, "Sebastian
> Trueg"^^<http://www.w3.org/2001/XMLSchema#string>. Existing - Affected
> Resource: nepomuk:/res/75164167-3ae0-413f-a991-ed73a08ca9ec, new card: 2, old
> card: 0"
>
>> As I understand your code you generate a hash of each resource to check if
>> two are exactly the same. That probably works for most use-cases, but I'm
>> not sure if it is the best solution.
>> Given the problem above you'd have to rerun the hashing for the resources
>> which were modified due to a merged resource, so that already complicates
>> matters.
>>
>> I thought maybe it would be possible to leave the merging up to the normal
>> resource merger. This would have the effect that not only exactly equal
>> resources would be merged, but all, just as the resource merger would
>> normally merge them.
>> If you think of the SimpleResourceGraph as a tree, a post-order traversal of
>> the tree would allow you to store each resource one by one, starting from
>> the leaves of the branch going to the root. The ResourceMerger would then
>> automatically merge all resources as necessary.
>>
>> Do you think that would be a viable option?
>>
>> Cheers,
>> Christian
>>
>> _______________________________________________
>> Nepomuk mailing list
>> Nepomuk at kde.org
>> https://mail.kde.org/mailman/listinfo/nepomuk
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>
More information about the Nepomuk
mailing list