XMP, Krita, KFileMetaInfo and Strigi

Sun Jun 24 19:45:48 BST 2007

On Sunday 24 June 2007 19:47:20 Richard Dale wrote:
> On Saturday 23 June 2007, Evgeny Egorochkin wrote:
> > On Saturday 23 June 2007 22:46:06 Richard Dale wrote:
> > > On Saturday 23 June 2007, Evgeny Egorochkin wrote:
> > > > On Saturday 23 June 2007 20:08:50 Cyrille Berger wrote:
> > > > > > That is why I thought Nepomuk could not
> > > > > > solve the problem right away. What would be of interest however,
> > > > > > would be to also store the data in Nepomuk to make it searchable
> > > > > > and linkable. maybe for this we need a bridging ontology that
> > > > > > links XMP data with data specified in Nepomuk ontologies.
> > > > >
> > > > > Yes but that's really for the future :)
> > > >
> > > > Not so distant. Many of XMP features can be mapped to existing
> > > > Nepomuk ontology, and I'll consider missing XMP features when coming
> > > > up with suggestions for the next Nepomuk ontology draft.
> > > >
> > > > The problem is that you can't do 1:1 nepo-xmp mapping :(
> > >
> > > Does that mean that Nepomuk has it's own ontology, and we can't use
> > > existing RDF ones?
> >
> > Reusing existing ontologies is not as straightforward as you might think.
> > The devil is in the details.
> >
> > The problem is that some ontology might have a Name property for person.
> > Other ontology has FirstName and LastName properties.
>
> Yes, but that's why RDF has namespaces. You can always write a bridge
> between ontologies such that when, say a FOAF based data set for personal
> details is stored in the triple store you might infer VCARD (or NEPOMUK)
> equivalents and add them too.

I don't think there are any plans to expose ontologies other than nepomuk one 
or adaptations of other ontologies to nepomuk via KMetaData interface.

However import and export of data in other ontologies is available. The idea 
is to have a coherent and consistent interface to data, and not just a set of 
special-purpose narrow ontologies.

> > One ontology might express rating as an integer in 0-100 range, other one
> > uses float in 0-1 range etc.
>
> Yes, but you can map one onto the other if your program (the nepomuk
> extractor) knows about the definitions.

I didn't deny this.

> > When ontologies use essentially different structures/ideologies to
> > represent data, it becomes a real headache.
>
> No, I disagree - one of the main features of RDF is to allow ontologies to
> be combined.

In fact, its generic structure lets you easily lump different data together. 
However, combining/integrating usually can't be done just by writing another 
rdf(s) file. You have to write some code.

> That might not always be easy. If I have a library 
> classification system for religions and the Dewey decimal system, doesn't
> have many classification types for Islam as opposed to Christianity it
> doesn't mean that a clever person can't derive a mapping of Dewey religions
> onto an Islamic equivalent and the other way too.

But you can't get a lossless round-trip between these classifications and 
that's what I meant when I said 95% of features. Overrall, it works, but 
there are still inconsistencies.

> > Often you have several ontologies trying to represent the same data( a
> > good deal of media ontologies is an example of this).
> >
> > This is further complicated by outdated ontologies, ontologies with
> > questionable practices(I had pointed out some issues with XMP which go
> > against todays semantics approaches, but it was ok when the standard was
> > drafted)
>
> So only the Nepomuk  team can design ontologies correctly, 

No, it's not only Nepomuk. Adding properties to literals(as done in xmp) is 
expressly forbidden by all todays standards, software will reject such data 
etc. So it's not a Nepomuk invention, and I explained in previous email why 
it fundamentally flawed.

> and we should 
> just discard XMP data and not add it to the triple store?

I said we might lose 1-10% of data due to ontology inconsistencies. It depends 
on how severe are inconsistencies and how much coding effort is put into 
this. This doesn't mean all data is discarded.

> This doesn't seem 
> to ring true to me. If you want to design an extra ontology for KDE like
> SCOT does for tagging data that is fine (I assume you can tag XMP via
> SCOT), but I don't think you should just discard XMP data and not write it
> to the store.

The problem is that you can't write XMP data directly to the store, since it's 
malformed RDF, and it would pollute the DB. The data has to be transformed to 
play nicely with the rest of the DB.

> I really think it is important the a SPARQL endpoint on a KDE 
> desktop should be exactly the same as a SPARQL endpoint on the web.

Nepomuk is more than just a RDF DB. RDF is an underlying techology ATM, yes, 
but nepomuk is intended to provide a consistent interface to data. 

E.g. regardless of which document format you use be it ODF or MS Office or 
some HTML with meta tags, you have exactly the same interface to find out 
who's the author etc. If a new file format is introduced, all apps will work 
transparently with it.

At the same time I agree with you that it might be benefical to let apps 
access tags in their naitve ontologies e.g. XMP.

The problem here is a common API, since e.g. XMP is malformed RDF, some other 
cases might not map cleanly to RDF no matter what you do to them.
Otherwise(without a common API) apps could as well use appropriate low-level 
access libraries.

While this is an interesting feature and it would allow apps like Krita access 
full XMP via nepomuk, I don't think there's sufficient coding effort that 
could be allocated to this ATM.

> XMP 
> locally combined with XMP data queried from the internet is more useful
> than Nepomuk data which only works locally.

Conidering that Nepomuk can map most of XMP data, I don't see a major problem,
Nepomuk would aggregate all that data just fine, and in fact it would allow a 
lot more apps to interface to XMP data sources.

> > Thus, making an ontology as generic as nepomuk is intended to be involves
> > compromises.
> >
> > Also, if it becomes clear you can't be compatible in some aspect with all
> > existing ontologies, it makes sense to adopt the most sensible approach
> > according to todays knowledge, and not yesterdays assumptions or outright
> > mistakes.
> >
> > That said, proper ontology design and a reasonable coding effort can map
> > 90-95% of other ontologies features even in cases of complex and
> > troublesome ontologies.
> >
> > Compare this to typical usage cases like IDv2.4. Hardly anyone uses even
> > 10 of tags provided by the standard, where in fact there are 10x more.
> >
> > The concern of 100% 1:1 mapping is for certain production apps that must
> > have access to all obscure features of a particular standard.
> >
> > > Is it ok to read from the Redland triple store directly or should it
> > > always be done via a Nepomuk service? I ask because Ruby has some nice
> > > software for reading and writing to rdf stores (ActiveRDF), and I would
> > > like to be able to use those apis for kde/ruby programming.
> > > QtRuby/Korundum can use QtDBus too. Should C++ programs access Nepumuk
> > > via Soprano (or ActiveRDF for ruby), and hence the triple store
> > > directly, or only go via a dbus api?
> >
> > While at this moment it might not matter, eventually you may expect some
> > performance tweaks and data stored quite differently than presented via
> > KMetaData API. Also, I think in many cases native constructs of OO
> > languages are much easier to use as compared to raw RDF.
>
> I don't care much about performance tweaks - I was thinking more about
> whether there will be some sort of inference engine in Nepomuk that would
> only be available via the C++ api. And so if you queried the triple store
> in Ruby via ActiveRDF you would miss out on these assumptions derived from
> knowledge that the triple store is part of the KDE desktop.

You can expect low-level contents to significantly differ from the RDF dataset 
seen by high level APIs.

At the same time, data seen by high-level APIs is still highly likely(I'm not 
aware of any plans to make it otherwise) to be 1:1 mappable to RDF, so it may 
be possible to provide direct interfaces to native RDF processing 
capabilities of other languages.

You really need to talk with Sebastian regarding this.

-- Evgeny