[Nepomuk] Re: Handling multiple sources of metadata
Bruce Adams
tortoise_74 at yahoo.co.uk
Tue May 3 19:24:57 CEST 2011
Hi,
That roughly accords with my originally intentions anyway.
I was thinking in terms of a standalone tool, library & api
for managing simple meta data (just tags)
and later growing this to support integration with nepomuk.
and incorporate other kinds of metadata.
I'm happy to hear suggestions.
There are two main design choices to consider.
1. the location of the metadata
one per file
one metadata area per directory
one per filesystem
on balance I believe per directory makes most sense.
Though it is not that much extra complication to say a metadata area is not
required for a sub-directory of a directory which already has one and this would
keep the meta-data layout simple.
2. the format of the metadata
binary or text
if text, trig, turle , trix or something else.
there is an advantage to the simplicity of <key>=<value> for just tags
but it will not scale well to complex meta data.
for binary I would imagine a standard database such as sqlite.
The advantage there is compactness.
There is nothing to stop either of these being configurable but it is sensible
to
start as you mean to go in.
I think metadata should live in a .metadata directory except that .metadata is
used by eclipse.
This is something that should be adoptable as part of the linux filesystem
hierarchy.
I don't think it should be .nepomuk as that might alienate gnomes.
If all metadata is rdf .rdf might be a good choice.
Anyway my intention is to start simple and go from there. No sense in running
before we can walk.
I will be able to test this on windows and linux. My primary target platform is
linux.
I don't have access to a mac.
The main thing is to get on and do something while I have the time and
enthusiasm.
Hopefully my plans will complement yours.
Regards,
Bruce.
----- Original Message ----
> From: Sebastian Trüg <trueg at kde.org>
> To: Bruce Adams <tortoise_74 at yahoo.co.uk>
> Cc: Nepomuk at kde.org
> Sent: Tue, May 3, 2011 5:34:37 PM
> Subject: Re: [Nepomuk] Re: Handling multiple sources of metadata
>
> For now this is mostly about removable devices such as USB keys. In a
> second step this could be applied to emails and IM messages where you
> attach the additional information to the email or the message.
>
> As far as USB keys go the idea was to define one special hidden file or
> folder in which the information would be saved as compressed text
> containing some RDF serialization - preferably trig. So the task is to
> update this file whenever the meta-data of the files on the disk are
> changed. In this case meta-data refers to non-file-meta-data, ie. tags,
> relations to people, projects, manual annotations of any kind.
>
> At first glance this is rather simple but when you look at it a bit
> deeper it becomes harder since it is not immediately clear what to save.
> Example: One file on the disk is related to a project. Thus, you store
> the project, too. But which details of the project do you store with it?
> Do you include all the participants or only the title? Do you store the
> tags the project has? And so on.
>
> The good thing is that Vishesh and I came up with a solution for the
> latter problem: we defined identifying and non-identifying properties.
> In a situation like the above you would only save the identifying
> properties of the project since that is all it takes to uniquely
> identify it, allowing it to be merged with a counterpart representing
> the same project later on.
> Without a doubt things like this need to be put into more words and be
> published. This is planned (like so many other things).
>
> Anyway, I would suggest you start with a stand-alone tool that can
> create such a file on a removable storage device when triggered
> manually. The next step would then be to integrate it with Nepomuk and
> let the updating be done automatically. After that we can look at
> importing this information as soon as the device is mounted and
> providing configuration like "Do you want meta-data to be stored on this
> device?".
>
> Cheers,
> Sebastian
>
> On 05/03/2011 03:10 PM, Bruce Adams wrote:
> >
> > Hi,
> >
> > I'm certainly interested. How much time I can dedicate to it is another
>matter.
> >
> > Do you have a particular scheme in mind?
> >
> >
> > Incremental improvements aside this also overlaps with network shares.
> > How do you get data from one server to another assuming both are running
> > nepomuk.
> > To tackle that properly you need to tackle security and multi-user issues.
> > With the file-system approach you can leave it to the OS.
> >
> > Regards,
> >
> > Bruce.
> >
> >
> > ----- Original Message ----
> >> From: Sebastian Trüg <trueg at kde.org>
> >> To: nepomuk at kde.org
> >> Sent: Tue, May 3, 2011 10:14:26 AM
> >> Subject: [Nepomuk] Re: Handling multiple sources of metadata
> >>
> >> Hi Bruce,
> >>
> >> this is what is done:
> >> - We store everything in one db
> >> - We index file metadate like id3 or xmp tags
> >> - We have a GSoC project for metadata writeback, ie. changed metadata in
> >> the db will be written back to the file if possible
> >>
> >> about extended attributes:
> >> - AFAIK most distributions disable them by default.
> >> - they are not supported by such file systems like fat which is used on
> >> most usb keys. thus, they do not increase interoperability much
> >>
> >> The idea we have is to store the metadata on the filesystem itself in a
> >> cross-platform way. This has been looked into but we need someone to
> >> really do it. Are you interested?
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> On 05/03/2011 01:33 AM, Bruce Adams wrote:
> >>>
> >>> Hi,
> >>> I am revisiting the idea of file tagging again.
> >>> There are potentially several places to store meta data.
> >>>
> >>> 1 Stored in a database.
> >>> 2 Embedded in the file format. E.g XMP/EXIF
> >>> 3 Stored in extended file attributes
> >>> 4 Stored in a special meta-data file associated with the original file.
> >>>
> >>> Embedded data is explicitly mentioned here:
> >>>
> >>> http://api.kde.org/4.0-api/kdelibs-apidocs/nepomuk/html/index.html
> >>>
> >>>
> >>> with ID3 tags used as the example.
> >>> What about XMP tags added, for example, in digikam?
> >>>
> >>> Unless I am mistaken Nepomuk currently only uses its own database.
> >>> I understand the reason for this approach is that the database solution
>is
>
> >> the
> >>
> >>> only one that works for all cases.
> >>> (Though the link to the FAQ where I was looking was broken)
> >>> I personally think it is wrong to make it the primary location as losing
> >>> metadata when you copy files around is broken behaviour.
> >>>
> >>> I was wondering (especially with a sprint potentially coming) what the
>ideal
>
> >>
> >>> system would be.
> >>> This is revisiting old ground but bitrot seems to have affected my google
>
> >> search
> >>
> >>> results so forgive me re-asking old questions.
> >>>
> >>> If you have multiple sources of the same data and they disagree which
>should
>
> >> be
> >>
> >>> considered primary?
> >>> Who is responsible for syncing them if they disagree?
> >>>
> >>> My thinking is as follows:
> >>>
> >>> File embedded data is primary.
> >>> Extended file attributes are secondary and should only be used for data
>when
>
> >> the
> >>
> >>> file format does not allow for embedding.
> >>> Meta data associated with the original file is simulation of the above
>and
>
> >> hence
> >>
> >>> comes next.
> >>>
> >>> The database is last but definitely not least,
> >>> If it is able the server should sync the data.
> >>>
> >>> For example:
> >>> Given an image tagged in nepomuk (e.g. via gwenview) nepomuk or a
>service
>
> >> on
> >>
> >>> its behalf should
> >>> add the embedded tags itself (on gwenviews behalf - assuming gwenview
> >> did't do
> >>
> >>> it)
> >>>
> >>>
> >>> Given an image tagged outside of nepomuk (e.g. in digikam) nepomuk
>should
>
> >>> import the tags into its database
> >>> the next time it needs to query the file (or when indexing it).
> >>>
> >>> Similarly I think extended file attributes should be imported/exported
>where
>
> >> the
> >>
> >>> file system supports them
> >>> and with an optional fall back to simulating them with .metadata files or
>
> >>> similar.
> >>>
> >>> I read something alluding that extended file attributes are unsuitable
>for
>
> >>> nepomuk data as they are stored as pairs
> >>> whereas nepomuk uses triples. Hyperlinks to the details were either
>missing
>
> >> or
> >>
> >>> broken.
> >>> I'm not sure I understand the problem. Surely both triples and pairs can
>be
>
> >
> >>> converted between easily enough when the
> >>> base representations are strings? E.g. "A" "B" "C" becomes either "A:B"
>"C"
>
> >> or
> >>
> >>> "A" "B:C"
> >>>
> >>>
> >>> Are there some other limitations on extended file attributes that I'm not
>
> >> aware
> >>
> >>> of?
> >>>
> >>> At the risk of re-opening old wounds I notice beagle uses extended
> >> attributes so
> >>
> >>> I assume xesam does too.
> >>> This would be another way to promote interoperability in spite of
> >> incompatible
> >>
> >>> ontologies.
> >>> For the case of simple file tagging nepomuk <-> XMP <-> xesam might work
>
> >> for
> >>
> >>> example with only the simple
> >>> digikam ontology as an interchange. xmp.digikam.TagsList for example.
> >>>
> >>> Any thoughts?
> >>>
> >>> Regards,
> >>>
> >>> Bruce.
> >>> _______________________________________________
> >>> Nepomuk mailing list
> >>> Nepomuk at kde.org
> >>> https://mail.kde.org/mailman/listinfo/nepomuk
> >>>
> >> _______________________________________________
> >> Nepomuk mailing list
> >> Nepomuk at kde.org
> >> https://mail.kde.org/mailman/listinfo/nepomuk
> >>
> >
>
More information about the Nepomuk
mailing list