XMP, Krita, KFileMetaInfo and Strigi

Cyrille Berger cberger at cberger.net
Fri Jun 22 22:44:54 BST 2007


First, please don't generalize from my examples, they most of the time are a 
little part extracted from metadata specifications (two of them are around 
150 pages even if all of it is not of interest, it's much bigger than what I 
can summarize here.

> > > What is important that all KDE apps show the same metadata for files.
> >
> > What do you mean by that ?
>
> In konqueror 3 you can edit metadata of various files because of the
> shared metadata code.
I guess you mean 4, I am apptenting to build it right now to see that in 
action. It might takes a while but I have dig throught lxr.kde.org to find 
the editor, I hope I am wrong but I got the impression that the editing of 
metadata happens with a QListView. You will really have a hard time to 
convince me that this the best way to do edit metadata. (Note: in the time I 
wrote this kdebase has indeed failed to build so I can't check in real life)

> This functionality is inherited by all KDE applications.
> Having well thought through shared functionality like 
> this can benefit many apps if it is in a core library.
> Consistency is important. A field 'title' should have the same value
> in any application that shows metadata for a file or other resource.

Where is the spec of KDE metadata ? Because you do realize that what is 
called 'title' in for metadata KDE might very well be called differently in 
an other specification (ok not much for 'title', as it correspond 
to 'dc:title' in XMP / DublinCore, but I have seen the tag for author beeing 
referenced as 'Author', 'Creator', 'Credit' and god knows what else). So if 
you want to do this you need to either write specifications, or use others, I 
have no idea which course you have chosen.

> > > If we start using different libs for this, this will not happen and
> > > that is very confusing for the user.
> >
> > I don't think user should ever see a library, and not even been talked
> > about a library. A user just have to see a User Interface, but unless I
> > have missed something that's not about what we are talking about ?
>
> Yes, the confusion I talk about is different apps showing different
> metadata values. By using different code this is hard to avoid. Of
> course agreeing on a standard is also possible, but would result in
> code duplication and is also more errorprone.
I don't see any safeguard in strigi/kfile*. Nothing prevent an application or 
an user to add a tag called "Titre" (title in french).

> > > > But the biggest problems and limitations comes from KFileMetaInfoItem
> > > > - QVariant is not good enought:
> > > >  - it lacks some important types like rationals
> > >
> > > a double is not a rational? QVariant::Double
> >
> > Yes it is, but a rational isn't a double, it's a division between two
> > integers.
> > For instance, there is no double representation of the rational 2 / 3
> > (0.66666666666666666666666666666666 is just an approximation, even if the
> > mathematical theory says that 0.666 followed only by 6 is equal to 2 / 3,
> > but there is no way to represent such a thing in double, 2/3 is
> > represented by 0.666..67 in a double anyway).
>
> It is very simple and not expensive to find a nice rational from a
> double.
Is it ? Good luck on finding the original rational for this one 
3.52548656163114 (it's a real life example taken out of the exif tag of my 
camera) or 8.45121765136719 (ShutterSpeed taken from 
http://exif.org/samples/canon-ixus.html). The goal isn't to find "a" 
rational, which rules out 352548656163114/100...00, the goal is to find "the" 
rational (and multiplying 8.45121765136719 by 65536 isn't good enought has 
there is no way to know that this was the denumenator in the first place). 
You might say who cares ? Yes who cares about data lost. There is one 
specific reason why the people behind exif have choosed to use rationals and 
not double, it's to avoid the lost of precision.

> For the graphical applications you're referring to zoom 
> ratios, resultions and more are nicely expressed in ratios. The range
> for the denominator is however very limited, making it even easier to
> transform a double into a rational. There is no need to store it as
> two integers.
Please don't try to guess what graphics applications needs :) You are right 
about resolution. But I can't care less for zoom ratios.

And you are forgetting about the cameras tags, while most of them are best 
displayed as double (like the FNumber, aperture) and are easily convertible 
from double. But some aren't, allways, like the shutter speed, or the OECF 
(you might argue that OECF aren't usefull to most users).

> > > >  - no distincion between ordered list / unordered list (I know it
> > > > might seems unimportant, just treat an unordered list as ordered, but
> > > > the fact is an application should never never sort an ordered list)
> > >
> > > I'm not sure I understand, if a list must be sorted, the analyzer
> > > should sort it and pass it on sorted. Can you provide the use case?
> >
> > No you didn't understand, their are two kinds of lists: lists for which
> > order matters (a list of author, they are usually sorted in the order of
> > importance of his work) and list for which order don't matters (like
> > keywords). And while it's very understandable that an application wants
> > to sort a list of keywords, it is unacceptable for a list of authors.
>
> Yes, so the lists that the application might like to sort can already
> be sorted by the analyzer so that the application does not ever have
> to do the sorting. Then it will not do sorting when it is not
> appropriate.

And if in the editor I add a keyword to a list of keywords, and not offering 
the user to sort the list. I might find that acceptable even if I much much 
prefer to have a check in the program.

> > > >  - there are two associatives array in XMP (and only QMap in
> > > > QVariant), Structure and Alternative Array, there is in fact a huge
> > > > difference between the two, Structure has a limited set of possible
> > > > keys, while Alternative Array is not limited
> > >
> > > Sounds like QValidator + QMap to me. Input validation when editing is
> > > important. That is why KFileWritePlugin provides access to a
> > > QValidator for the widgets.
> >
> > Unrelated.
> > Lets take an exemple out of the XMP spec , it defines a Dimension
> > structure as follow:
> >  - one field named "w" of type real
> >  - one filed named "h" of type real
> >  - one filed named "unit" of type string
> >
> > How I would use that structure in the code is:
> > QMap<QString, Value> dimensionValue;
> > dimensionValue["w"] = 10.0;
> > dimensionValue["h"] = 20.0;
> > dimensionValue["unit"] = "cm";
> >
> > Then I create a metadata entry:
> > metaDataList["nameofthetag"] = Value(dimensionValue, Value::TypeStructure
> > );
>
> How is this better than simply width and height with a default unit
> where the GUI decides in which appropriate unit to display?

I knew I was taking a too easy example :) See IPTC Core 
(http://www.iptc.org/IPTC4XMP/) for a much more complex exemple of structure 
Beside you would have to prefix the tag with "ImageSize" as "ImageSize.w" 
and "ImageSize.h" (it's purely fictional it's not used for ImageSize but I am 
too tired to do a proper search in the XMP spec), so what, it's not much of a 
problem would you say ? It makes things much more complicated when you want 
to validate and when you want to search metadata, there is a reason why there 
are "Node" in file systems.

As for GUI, in a model/view system the view don't need to bother with the 
model and vice versa. SO yes the user should see QLabel("Width : ") | 
QIntegerSpinBox | QComboBox (with a list of units)

> > The second example is called "Alternative Arrays". The main usecase for
> > this is for translation of some metadata (but there are some other uses
> > in the spec). So basically in your code you have
> > QMap<QString, Value> description;
> > description["fr-fr"] = "Ma description";
> > description["en-us"] = "My description";
> >
> > then I create the metadata entry as
> > metaDataList["dc:description"] = description;
>
> This is can be stored as a QMap. How to get the appropriate text out
> in the GUI is indeed not trivial, but a special type with appropriate
> display rules would fix it.
euh we don't have the same sense of what is easy and difficult :) What should 
appears in the UI is the local language if available, as for editing, it's 
easy, all values should be shown.

> > So what the difference ? The difference is that spec limit the number and
> > the name of values in a structure (dimensionValue["x"] = 3.0; is invalid
> > while adding description["de-de"] = "meine Beschreibung"). So
> > there is a need for later purpose to make a distinction between those two
> > (not counting that it helps when saving if you have an idea of the real
> > type of the value).
>
> I think you mean that description["x"] = 3.0 is not valid, right? So
> you need to validate the input. Still sounds like QValidator to me
> (although i agree that a QValidator that takes QVariants would be way
> better).

No I mean that there is no "x" field in the structure describing Dimension.

> > > >  - all the above can be more or less hacked in QVariant using
> > > > UserType, even if a real API to manipulate them is better. But there
> > > > is an other problem which need an extended QVariant . In XMP value
> > > > can be associated to a "Property qualifier" (unless doing something
> > > > really horrible like storing all value in QList<QVariant> with first
> > > > item the value and the second item the property qualifier...). The
> > > > typical use case of "property qualifier" is for a list of author of a
> > > > document, imagine for instance a book with illustrations in it, you
> > > > would have to authors "Paul" (the text writter) and "Jane" (the
> > > > drawer), "property qualifier" allow to indicates that Paul was the
> > > > text writer and Jane "the drawer".
> > >
> > > This type of information is indeed harder, but we should solve it at
> > > the KDE level, not for one particular app. A UserType is not a hacked
> > > solution in my opinion if it serializes nicely.
> >
> > And once again you miss the point :) Maybe I shouldn't have spoken of the
> > UserType in this sentence. So forget about the first line and reread the
> > rest.
>
> I do not see how I missed the point. All data can be represented as a
> QVariant. A QVariant is recursive after all. In this case you could
> use a QList<QList<QString> > variant. This does require some difficult
> typing and I can understand that one would want to do it more nicely.
> So why not extend what we have instead of starting a different
> framework. If you have the stuff to solve these problems, why not
> share it with the rest of KDE by extending KFileMetaInfo e.g. by
> subclassing or wrapping it. Then when the feature set becomes more
> clear the improvements could go into the core.

Unless I have missed something about the API and C++ (which I might) but I 
can't extend  KFileMetaInfoItem, it's passed as value not pointer. Or you 
meant store my content as a QList< QList< QVariant> > in a KFileMetaInfoItem 
which I would convert after KFileMetaWrite/Read (or whatever their names are) 
to a KisFileMetaInfoItem (deriving KFileMetaInfoItem) with my extension to 
nicely access those tags ? (I will refrain to comment until I am sure that's 
what you meant)

> > > > The whole framework also lack validation Schema but that's a slight
> > > > problem that could be added at Krita level, even if I do think it's
> > > > better to have Value (QVariant), Entry (KFileMetaInfoItem) and Schema
> > > > interacting close togther, for instance to have
> > > > KFileMetaInfoItem::setValue call the Schema to check if it can accept
> > > > this value and then for instance try to convert it if
> > > > possible.KFileMetaInfoItem
> > >
> > > QValidator in KFileWritePlugin
> >
> > As you are aware QValidator takes a QString in input and return true if
> > the string is a valid answer.
> > What is needed here is more complex, among other things:
> >  - each tag is associated to a single type, so when the tag is
> > initialized it needs to be initialized with the correct value
> >  - structures needs to be checked to see if they have the correct fields
> > (no field "x" for the Dimension and not missing a field "h")
>
> Yes this sounds good. Something like this would not be easy though.
Euh once again we don't have the same idea of what is easy or not :)
It's not yet in the api of Krita, but here is how I see thing:
Schema {
	public:
		Type typeForTag(QString ); ///< return the type of the tag
		QString typeNameForTag(QString); ///< return the name of the type usefull to 
get the name of the structure
		Schema* schemaForStructure(QString structureName); ///< return the 
validation schema for the structure
		bool tagExistInSchema(QString);  ///< return true if the tag is part of the 
schema
};

Maybe the return type of schemaForStructure should be different from Schema 
(XMP schema have a namespace/uri which structure doesn't among other 
differences, but for now that will do)

and then in Entry:

Entry {
	public:
		QString name();
		const Schema* schema();
		void setValue(Value);
};

Entry::setValue(Value v)
{
	if(v.type() == schema()->typeForTag(name()))
	{
		// if a structure, do recursive checking
	} else {
		// Attempt to convert if possible
	}
}

> > > > > I'd love to help you with this effort, since obviously XPM is an
> > > > > important file format.
> > > >
> > > > XMP isn't really a file format :) But yes it is very important, and
> > > > not only for image application, but for most of document application,
> > > > it has application for Video and Audio as well, even if in those area
> > > > the use of XMP is even less widespread than for Images.
> > >
> > > And that is the reason it should be incorporated in the Nepomuk
> > > ontology and work with Strigi and KFileMetaInfo. If this is not
> > > possible, then these frameworks are inadequate and should be enhanced.
> >
> > I am bit confused now, but I talk with Sebastien about XMP/Exif/IPTC
> > metadata and he told me Nepomuk couldn't help me directly in that matter,
> > so I must have missed something.
>
> Nepomuk is setting up an ontology for describing the relations between
> many data types. It can probably not solve all problems you have, but
> perfection is the enemy of success. It is better to collaborate on the
> simple cases and let the only the complex cases be unique to some
> applications.
In fact I don't see how it solves any of the problem I have yet :) A summary 
of the result of the discution I had with Sebastien is here 
http://wiki.koffice.org/index.php?title=Krita/Metadata#Metadata_and_Nepomuk. 
And really it's part of the nice thing I want to have, but I think it will be 
delayed for Krita 2.1 or 2.2. And it should probably be the result of common 
work with other graphics application of KDE.

> We would love to have your input on the ontology. You can find it here:
I guess your forgot something :)

> > > > Anyway, here is how I see sharing code, and in a way that don't bloat
> > > > both framework, I much prefer to have things keep simple, especially
> > > > that neither KisMetaData nor the base KMetaData library
> > > > (KFileWritePlugin + KFileMetaInfoItem) are that big.
> > >
> > > I completely agree, but I dont think KisMetaData should exist at all
> >
> > There are other missing stuff in strigi annalyzers and KFile* but they
> > don't need changes, they are just addition, so I didn't spoke about them
> > (unless you want to ?).
>
> If it's not too much work, yes please.

It's still very much in thinking (some of it is partially available on the 
koffice wiki : http://wiki.koffice.org/index.php?title=Krita/Metadata ):
- filters are applied to a list of metadata when outputing to either add or 
remove data
 * anomizer, I have received complaint that Krita saves the name of author and 
some other personal information when it saves JPEG and PNG, so the anomizer 
filter will be charged to remove all personnal information
 * automatic filling some tags, like to change the date of the last edition, 
or the name of the program which created the software

- merging, in Krita tags are not associated to file but to layers (I am still 
pondering if metadata list should associated to the image or not, but it 
raises so much more problem), in 1.x when two layers are merged, the metadata 
is simply erased, it's the easiest solution to the problem and probably not 
the most "wrong", for 2.0 I want some merging mechanism, for instance list of 
author of merge layered = { list for layer 1, list for layer 2 } (removing 
duplicated entry, and randomly select which author should appear in first 
position), etc...

- licence check, Krita should warn if you are including a "commercial" licence 
arwork in an image including a GPL licenced artwork (but that's very much an 
open issue how to do that)

There is something else which is very very krita specific, their is an history 
tag somewhere in the XMP specification (xmpMM:History), where applications 
are invited to store the list of actions which resulted in that piece of 
artwork (but that relies on part of Krita which yet needs to be written).


> Well, just think it's so unfair that only krita will have this stuff.
Which stuff ? 

> I first learned about this stuff on your blog and was a bit baffled
I have allready spoken of XMP on this list.

> that there's a KDE project so completely separated from the existing
> architectures. The power of KDE is having good frameworks and sharing
> them.
I will probably be flamed by a flame thrower. But that's why I am not convince 
by how the developement of KDE4 happened. Maybe I am wrong, but it's really 
the feeling I have, libraries were too much developed disconnected from 
application, I really think that subsystem should be developed as a private 
part of an application (or a module), even if two applications are working on 
a similar projects, and then onced it has mature, once it supports all needed 
features, then move it up in kdelibs and merge it with competitors (if any), 
and try to complete it with new features. Trying to build a library from 
scratch isn't an easy task. Just my opinion :)

I also have the feeling that too much decision are taken at akademy (or even 
at some other special events) which lets people who didn't comes out of the 
loop and can only lead to clash at some points. So that's a hint for next 
Akademy, don't take hard solid decision, and once you have talked of 
something involves people who weren't present.

> I've seen too much effort being lost by reinventing incompatible
> technologies and given the effort going into Nepomuk and Strigi, doing
> metadata incomplete, but compatible with that is much more valueable
> than doing it more complete but incompatible.
No it must _be_ complete and compatible with the standard of the graphics 
world. I much prefer Krita to be a little bit isolated from KDE than from 
it's targeted audience, really. If using Krita inside KDE gives a better user 
experience, so much the better :)

-- 
Cyrille Berger




More information about the kde-core-devel mailing list