XMP, Krita, KFileMetaInfo and Strigi

Fri Jun 22 20:09:26 BST 2007

On Friday 22 June 2007 20:09:27 Jos van den Oever wrote:
> 2007/6/22, Cyrille Berger <cberger at cberger.net>:
> > > > But the biggest problems and limitations comes from KFileMetaInfoItem
> > > > - QVariant is not good enought:
> > > >  - it lacks some important types like rationals
> > >
> > > a double is not a rational? QVariant::Double
> >
> > Yes it is, but a rational isn't a double, it's a division between two
> > integers.
> > For instance, there is no double representation of the rational 2 / 3
> > (0.66666666666666666666666666666666 is just an approximation, even if the
> > mathematical theory says that 0.666 followed only by 6 is equal to 2 / 3,
> > but there is no way to represent such a thing in double, 2/3 is
> > represented by 0.666..67 in a double anyway).
> It is very simple and not expensive to find a nice rational from a
> double. For the graphical applications you're referring to zoom
> ratios, resultions and more are nicely expressed in ratios. The range
> for the denominator is however very limited, making it even easier to
> transform a double into a rational. There is no need to store it as
> two integers.

+1. It may be nice to have a hint somewhere that the value should be displayed 
as a ratio if possible.

> > > >  - no distincion between ordered list / unordered list (I know it
> > > > might seems unimportant, just treat an unordered list as ordered, but
> > > > the fact is an application should never never sort an ordered list)
> > >
> > > I'm not sure I understand, if a list must be sorted, the analyzer
> > > should sort it and pass it on sorted. Can you provide the use case?
> >
> > No you didn't understand, their are two kinds of lists: lists for which
> > order matters (a list of author, they are usually sorted in the order of
> > importance of his work) and list for which order don't matters (like
> > keywords). And while it's very understandable that an application wants
> > to sort a list of keywords, it is unacceptable for a list of authors.
>
> Yes, so the lists that the application might like to sort can already
> be sorted by the analyzer so that the application does not ever have
> to do the sorting. Then it will not do sorting when it is not
> appropriate.

Ordered/unordered list is a valid point. However, it's not hard to implement. 
Lists are stored ordered, but we have a hint somewhere as to whether the 
order matters or not.

> > > >  - there are two associatives array in XMP (and only QMap in
> > > > QVariant), Structure and Alternative Array, there is in fact a huge
> > > > difference between the two, Structure has a limited set of possible
> > > > keys, while Alternative Array is not limited
> > >
> > > Sounds like QValidator + QMap to me. Input validation when editing is
> > > important. That is why KFileWritePlugin provides access to a
> > > QValidator for the widgets.
> >
> > Unrelated.
> > Lets take an exemple out of the XMP spec , it defines a Dimension
> > structure as follow:
> >  - one field named "w" of type real
> >  - one filed named "h" of type real
> >  - one filed named "unit" of type string
> >
> > How I would use that structure in the code is:
> > QMap<QString, Value> dimensionValue;
> > dimensionValue["w"] = 10.0;
> > dimensionValue["h"] = 20.0;
> > dimensionValue["unit"] = "cm";
> >
> > Then I create a metadata entry:
> > metaDataList["nameofthetag"] = Value(dimensionValue, Value::TypeStructure
> > );
>
> How is this better than simply width and height with a default unit
> where the GUI decides in which appropriate unit to display?

This exposes a more fundamental problem. XMP seems to be based around RDF, 
however current RDF spec, general practices and understanding of the subject 
differs in some aspects. This maybe due to the age of XMP I don't know.

ATM typical approach to define e.g. sizes is as follows:

1) If possible units are the same kind, but are orders of magnitude different 
e.g. cm,m,km etc. Default unit is used (m), and it's up to GUI to do the 
rest.

2) If possible units are esspentially different like pixels/cm/% of parent, 
different properties are created named like SizePx,SizeCm,SizePercentage with 
default units. Also, possibly a parent property is created to group all 
related sizes.

The advantage of this approach is that much more semantical information 
becomes available to computers.

* It's prettier since typical dimensions can be specified like this:
	SampleImage SizeXPx 32
	SampleImage SizeYPx 48

* You can mix different sizes like specifying X in cm and Y in pixels.

* Actual information is better separated(for easy processing) from GUI 
nuances.

* Units and unit relations are described like other data structures. This way 
apps can understand better the units they were not hardcoded to work with.

Mind you, it's still possible to have a mapping with XMP, since in this case 
XMP provides a subset of functionality.

> > The second example is called "Alternative Arrays". The main usecase for
> > this is for translation of some metadata (but there are some other uses
> > in the spec). So basically in your code you have
> > QMap<QString, Value> description;
> > description["fr-fr"] = "Ma description";
> > description["en-us"] = "My description";
> >
> > then I create the metadata entry as
> > metaDataList["dc:description"] = description;
>
> This is can be stored as a QMap. How to get the appropriate text out
> in the GUI is indeed not trivial, but a special type with appropriate
> display rules would fix it.

Need to look into this. Not all implications are clear ATM.

> > So what the difference ? The difference is that spec limit the number and
> > the name of values in a structure (dimensionValue["x"] = 3.0; is invalid
> > while adding description["de-de"] = "meine Beschreibung"). So
> > there is a need for later purpose to make a distinction between those two
> > (not counting that it helps when saving if you have an idea of the real
> > type of the value).
>
> I think you mean that description["x"] = 3.0 is not valid, right? So
> you need to validate the input. Still sounds like QValidator to me
> (although i agree that a QValidator that takes QVariants would be way
> better).

Actually, it's possible to limit applicable structure members. Each RDF 
resource(structure in a resource as well) belongs to a class. Its class 
implies limitations on applicable properties.

> > > >  - all the above can be more or less hacked in QVariant using
> > > > UserType, even if a real API to manipulate them is better. But there
> > > > is an other problem which need an extended QVariant . In XMP value
> > > > can be associated to a "Property qualifier" (unless doing something
> > > > really horrible like storing all value in QList<QVariant> with first
> > > > item the value and the second item the property qualifier...). The
> > > > typical use case of "property qualifier" is for a list of author of a
> > > > document, imagine for instance a book with illustrations in it, you
> > > > would have to authors "Paul" (the text writter) and "Jane" (the
> > > > drawer), "property qualifier" allow to indicates that Paul was the
> > > > text writer and Jane "the drawer".

"Property qualifiers" are another XMP hack. This practice is frowned upon in 
todays semantic world to the point of RDF serialization specs specifically 
banning such constructs.

The reson is simple:

Case 1) Role Property.

XMP practice is to have an Author property and then specifying author roles 
e.g. "Composer". Todays best practice is to create a Composer subproperty to 
indicate the role.

The advantage is an ability to further describe what is a composer in general, 
ability to limit applicability of the role only to specific classes like 
Music.

Case 2) Multiple "property qualifiers"
Resolved by using generic structures. This allow to define limits on 
applicable properties etc etc just like in any other case

Problem) Ambiguity
Suppose you have defined this property qualifier:

SampleSong 	Author	"Mike"
"Mike"		Role 		"Composer"

However, in other music "Mike" might be a performer:

SampleSong 	Author	"Mike"
"Mike"		Role		"Performer"

So now two roles got confused.

Again it's not problem to implement this on strigi/nepomuk level.

> > > This type of information is indeed harder, but we should solve it at
> > > the KDE level, not for one particular app. A UserType is not a hacked
> > > solution in my opinion if it serializes nicely.
> >
> > And once again you miss the point :) Maybe I shouldn't have spoken of the
> > UserType in this sentence. So forget about the first line and reread the
> > rest.
>
> I do not see how I missed the point. All data can be represented as a
> QVariant. A QVariant is recursive after all. In this case you could
> use a QList<QList<QString> > variant. This does require some difficult
> typing and I can understand that one would want to do it more nicely.
> So why not extend what we have instead of starting a different
> framework. If you have the stuff to solve these problems, why not
> share it with the rest of KDE by extending KFileMetaInfo e.g. by
> subclassing or wrapping it. Then when the feature set becomes more
> clear the improvements could go into the core.
>
> > > > The whole framework also lack validation Schema but that's a slight
> > > > problem that could be added at Krita level, even if I do think it's
> > > > better to have Value (QVariant), Entry (KFileMetaInfoItem) and Schema
> > > > interacting close togther, for instance to have
> > > > KFileMetaInfoItem::setValue call the Schema to check if it can accept
> > > > this value and then for instance try to convert it if
> > > > possible.KFileMetaInfoItem
> > >
> > > QValidator in KFileWritePlugin
> >
> > As you are aware QValidator takes a QString in input and return true if
> > the string is a valid answer.
> > What is needed here is more complex, among other things:
> >  - each tag is associated to a single type, so when the tag is
> > initialized it needs to be initialized with the correct value
> >  - structures needs to be checked to see if they have the correct fields
> > (no field "x" for the Dimension and not missing a field "h")
>
> Yes this sounds good. Something like this would not be easy though.

Still not too late to fix this.

> > > > > I'd love to help you with this effort, since obviously XPM is an
> > > > > important file format.
> > > >
> > > > XMP isn't really a file format :) But yes it is very important, and
> > > > not only for image application, but for most of document application,
> > > > it has application for Video and Audio as well, even if in those area
> > > > the use of XMP is even less widespread than for Images.
> > >
> > > And that is the reason it should be incorporated in the Nepomuk
> > > ontology and work with Strigi and KFileMetaInfo. If this is not
> > > possible, then these frameworks are inadequate and should be enhanced.
> >
> > I am bit confused now, but I talk with Sebastien about XMP/Exif/IPTC
> > metadata and he told me Nepomuk couldn't help me directly in that matter,
> > so I must have missed something.

Actually Nepomuk does have EXIF mapping.

No XMP mapping yet, but nowhere in the spec XMP was listed as banned standard 
with which no interoperability must be implemented, violators punished by 
death.

Hard to figure out what Sebastian meant exactly from this vague phrase :)

> > > except as Strigi analyzers and KFileWritePlugins. By going your own
> > > way, you completely miss the point of the semantic desktop, which is
> > > to allow all applications to know more about files and other objects
> > > on the desktop. If krita speaks its own language, it cannot talk to
> > > the other desktop apps and you lose the advantages this brings. This
> > > is more effort than writing your own stuff because you have to
> > > converse with others to get it working and agree on a common approach.
> > > This is not always easy, but the result is all the move valueable.
> >
> > I really really don't see any lost. The biggest part of all this is
> > allready assumes by other libraries that are shared.
> >
> > > > And honestly I do believe that it is better to it this way than twist
> > > > either framework to adjust to the need of the other.
> > >
> > > If you like to be on an island then yes.
> >
> > ... (I really love to see that kind of comment, that soooo constructive)
>
> Well, just think it's so unfair that only krita will have this stuff.
> I first learned about this stuff on your blog and was a bit baffled
> that there's a KDE project so completely separated from the existing
> architectures. The power of KDE is having good frameworks and sharing
> them.
>
> I've seen too much effort being lost by reinventing incompatible
> technologies and given the effort going into Nepomuk and Strigi, doing
> metadata incomplete, but compatible with that is much more valueable
> than doing it more complete but incompatible.

Assuming that XMP is any good(otherwise why bother?):

Suppose initially Krita will have XMP support and strigi/nepo won't. 
Eventually though(since we assume it's worth it), strigi/nepo will be forced 
to implement it due to demand. Apps will use generic KDE metadata framework 
to access XMP anyway.

Due to openness of projects, you can't really talk about "Islands"(maybe only 
for a moment). Even if both projects outright refuse to cooperate, you can 
expect the community to force-feed them whatever code necessary.

Collaboration here is inevitable I'm afraid :)

Cyrille, I invite you to discuss this issue in depth with me, Jos and 
Sebastian on either ML/IRC/Jabber(freenet channel #strigi; Jabber: 
Phreedom at jabber.org).

So far I think 70%+ of problems is general lack of understanding on both 
sides, since all of us don't have in-depth understanding of all involved 
projects.
25% is mostly small fixes.
5% might cause troubles, but you never know unless you try.

It's sad that this issue surfaced so late, but it's better than too late.

-- Evgeny