[Nepomuk] Re: Handling multiple sources of metadata

Wed May 4 16:52:20 CEST 2011

----- Original Message ----
> From: Sebastian Trüg <trueg at kde.org>
> To: Bruce Adams <tortoise_74 at yahoo.co.uk>
> Cc: Nepomuk at kde.org
> Sent: Wed, May 4, 2011 9:53:05 AM
> Subject: Re: [Nepomuk] Re: Handling multiple sources of metadata
> 
> Hi Bruce,
> 
> On 05/03/2011 07:24 PM, Bruce Adams wrote:
> >      That roughly accords with my originally intentions anyway. 
> > I was  thinking in terms of a standalone tool, library & api 
> > for managing  simple meta data (just tags)
> 
> IMHO it does not make sense to start with  tags alone. I think it would
> be much simpler to only start with literal  properties, i.e. those for
> which there is no need to store additional  resources.
> Then the next step would be to also store the additional resources  which
> gets much more complicated as it also involves garbage collection  when
> the user removes a property.
>
I am coming from a non-nepomuk background with different but overlapping goals.
I agree about starting with literal properties. 
I think of tags as being the simplest property possible. You are implying tags 
are not this simple.
I think in nepomuk they are associated with an ontology that knows the set of 
all possible tags.
This is a more feature rich representation but also a more complex one.
I guess we'll see how the code grows.
The 15 basic elements of the dublin  core would be another good set to start 
with.

> > and later growing this to  support integration with nepomuk.
> > and incorporate other kinds of  metadata.
> > 
> > I'm happy to hear suggestions.
> > 
> > There  are two main design choices to consider.
> >  1. the location of the  metadata
> >        one per file
> >         one metadata area per directory
> >         one per filesystem
> 
> IMHO there should be one file per file system. The  reason is simple:
> that way we only need to store additional resources like  the previously
> mentioned project once. If we had one file per dir or file  than we would
> have to store (and later merge) these additional resources over  and over.
>
Merging is an inevitable requirement when you copy data around.
You can't necessarily have one file per file system if that file system is 
multi-user.
It may be sufficient for a USB flashdrive but not when you have say 
/home/user1 
/home/user2

It brings in all the complexity of multi-users and security.
So I think the best thing is to be flexible.
So if you have an acessible metadata directory at the root of the filesystem
you need no other but if not try the next level down
(or rather start at the file and work up the tree until you find one).

> > on balance I believe per directory makes most  sense.
> > Though it is not that much extra complication to say a metadata  area is not
> > required for a sub-directory of a directory which already  has one and this 
>would
> > keep the meta-data layout simple.
> > 
> > 2. the format of the metadata
> >   binary or  text
> >   if text,  trig, turle , trix or something  else.
> 
> As I already mentioned I would prefer trig since that allows us to  store
> graph metadata which contains information like "when was the  data
> created" and "who created the data".
> 
> One could compress this  file. Sadly there is no pseudo-standard for RDF
> storage yet as there is for  SQL (sqlite) so using redland seems weird to me.
>
I'll look into it

> >    there  is an advantage to the simplicity of <key>=<value> for just  tags
> >    but it will not scale well to complex meta  data.
> >   for binary I would imagine a standard database such as  sqlite.
> >   The advantage there is compactness.
> > 
> >  There is nothing to stop either of these being configurable but it is 
>sensible 
>
> > to
> > start as you mean to go in.
> > 
> > I think  metadata should live in a .metadata directory except that .metadata 
>is 
>
> >  used by eclipse.
> > This is something that should be adoptable as part of  the linux filesystem 
> > hierarchy.
> > I don't think it should be  .nepomuk as that might alienate gnomes.
> > If all metadata is rdf .rdf  might be a good choice.
> 
> I would personally go for .nepomuk for now since  there will be no
> collaboration with Gnome anyway (well, at least I do not  believe in it
> after trying for several years. But maybe you would have more  luck ;)
> 
> Cheers,
> Sebastian
>
That is one reason why I would rather keep things simple and stand-alone.
Rather than starting from the nepomuk or xesam ontologies and trying to force 
one on the other.
I would rather stay out of that fight for now and offer an additional means of 
interchanging data. :)

Regards,

Bruce.