loud kfilemetainfo thinking

Mon Jan 15 23:26:39 GMT 2007

Hi all,

You've probably looked at KFileMetaInfo and KFileMimeTypeInfo. In
moving these classes forward to the semantic desktop, I think there is
quite a bit of low-hanging fruit to be gotton by keeping large parts
of the API but changing the implementation of metadata reading and
writing.

I will give you a small idea of what i think should happen to these
classes. In my opinion the prominent role of the mimetype should be
abandoned. We cannot continue working on the assumption that mimetype
always accurately predicts which files contain which fields and that
one mimetype maps to one KFilePlugin. A mimetype is just one of many
metadata that can be associated with a file. It is well possible that
a file has multiple mimetypes. An openoffice text document arguably
also has the mimetypes application/x-zip and application/octet-stream.
When getting data from this file you might be interested in the
zipfile aspects of it.

When a KFileMetaInfo object is created from a path, url, datastream,
it is filled with metadata about the file. There are a number of
different types of data that can be gotten for each file.

Reading data:

* Metadata that is read from a file or other resource.
   This information is embedded in the file or can be calculated from
the file contents alone.
   It is extracted with a selection of end- and throughanalyzers from Strigi.

* Metadata that is associated with a file or resource.
   Data that is stored outside of the file, e.g. in Nepomuk's RDF storage.

* Metadata that describes what type of metadata can be written into a file.
   This data is useful for showing input fields in a GUI. This data
should only be obtained if a file is a writeable block device. This
information is best gathered from a class similar to what is now the
KFilePlugin.

* Metadata that describes what type of metadata can be associated with
a file or resource.
   This data is useful for showing input fields in a GUI. This
information is provided by Nepomuk.

Writing:

* Metadata that is written into the file by a fieldspecific class.
KFileMetaInfo that is the interface for writing has a list of fields
that can be written to linked to classes that do the actual writing.

* Metadata is associated to the file and stored elsewhere, this is
handled by Nepomuk.

All of this information will be available in the form of RDF triples.
These are "subject, relation, object". For file metainfo, the subject
is always the file. The 'relation' is a uri that can be considered as
the fieldname. E.g. a triple might be "x.mp3,
http://purl.org/dc/elements/1.1/title, 'Controversy'",
where http://purl.org/dc/elements/1.1/title is the uri identifying the
title relationship.
This information could still be accessed with KFileMetaInfoItem
KFileMetaInfo::item(const QString &key).

The main differences in a new KFileMetaInfo implementation would be:
- KFileMimeTypeInfo becomes meaningless, each file is unique and
information gotten from KFileMimeTypeInfo  will be available through
KFileMetaInfo. In the background this information will still often be
grouped by filetype, but in principle it is only available _after_
file inspection.
- Code for reading and writing embedded metadata is separated.
- Information such as field type, field cardinality, human readable
field description and translations will be stored in the ontology
files. This is an implementation detail. KFileMetaInfo users should
not (need to) see the difference.

Ok, this was just some loud thinking that I'm archiving on this list.
Feel free to comment nevertheless.

Cheers,
Jos