loud kfilemetainfo thinking

Sun Jan 21 21:30:18 GMT 2007

2007/1/16, Jos van den Oever <jvdoever at gmail.com>:
> Hi all,
>
> You've probably looked at KFileMetaInfo and KFileMimeTypeInfo. In
> moving these classes forward to the semantic desktop, I think there is
> quite a bit of low-hanging fruit to be gotton by keeping large parts
> of the API but changing the implementation of metadata reading and
> writing.
>
> I will give you a small idea of what i think should happen to these
> classes. In my opinion the prominent role of the mimetype should be
> abandoned. We cannot continue working on the assumption that mimetype
> always accurately predicts which files contain which fields and that
> one mimetype maps to one KFilePlugin. A mimetype is just one of many
> metadata that can be associated with a file. It is well possible that
> a file has multiple mimetypes. An openoffice text document arguably
> also has the mimetypes application/x-zip and application/octet-stream.
> When getting data from this file you might be interested in the
> zipfile aspects of it.
>
> When a KFileMetaInfo object is created from a path, url, datastream,
> it is filled with metadata about the file. There are a number of
> different types of data that can be gotten for each file.
>
> Reading data:
>
> * Metadata that is read from a file or other resource.
>    This information is embedded in the file or can be calculated from
> the file contents alone.
>    It is extracted with a selection of end- and throughanalyzers from Strigi.
>
> * Metadata that is associated with a file or resource.
>    Data that is stored outside of the file, e.g. in Nepomuk's RDF storage.
>
> * Metadata that describes what type of metadata can be written into a file.
>    This data is useful for showing input fields in a GUI. This data
> should only be obtained if a file is a writeable block device. This
> information is best gathered from a class similar to what is now the
> KFilePlugin.
>
> * Metadata that describes what type of metadata can be associated with
> a file or resource.
>    This data is useful for showing input fields in a GUI. This
> information is provided by Nepomuk.
>
> Writing:
>
> * Metadata that is written into the file by a fieldspecific class.
> KFileMetaInfo that is the interface for writing has a list of fields
> that can be written to linked to classes that do the actual writing.
>
> * Metadata is associated to the file and stored elsewhere, this is
> handled by Nepomuk.
>
> All of this information will be available in the form of RDF triples.
> These are "subject, relation, object". For file metainfo, the subject
> is always the file. The 'relation' is a uri that can be considered as
> the fieldname. E.g. a triple might be "x.mp3,
> http://purl.org/dc/elements/1.1/title, 'Controversy'",
> where http://purl.org/dc/elements/1.1/title is the uri identifying the
> title relationship.
> This information could still be accessed with KFileMetaInfoItem
> KFileMetaInfo::item(const QString &key).
>
> The main differences in a new KFileMetaInfo implementation would be:
> - KFileMimeTypeInfo becomes meaningless, each file is unique and
> information gotten from KFileMimeTypeInfo  will be available through
> KFileMetaInfo. In the background this information will still often be
> grouped by filetype, but in principle it is only available _after_
> file inspection.
> - Code for reading and writing embedded metadata is separated.
> - Information such as field type, field cardinality, human readable
> field description and translations will be stored in the ontology
> files. This is an implementation detail. KFileMetaInfo users should
> not (need to) see the difference.

Nothing like replying to your own mail. This is a follow up on my thoughts.

I'd like to split the current KFilePlugin classes in to two new
classes: one reader that is a Strigi plugin and a writer that is a KDE
plugin. This distinction is made for a number of reasons.

In the current (kde3) implementation, the mimetype is used to
determine which plugin is loaded to analyze the file. This is a bit
odd, because some analysis must take place to know the mimetype. So in
the new scheme, the mimetype is not used to determine how to analyze a
file. A file is a analyzed by Strigi and this will usually also result
in a mimetype.

When determining how to write fields into a file, we can very well use
the mimetype to determine which plugin to load. The plugins for
writing will be loaded with the usual KDE method: KMimeTypeTrader.

The reading and writing will be performed rather different ways
(streambased and blockbased) and this is the justification for
splitting up the plugins. Also, most plugins only support methods for
reading, so the number of write plugins should be relatively small.