[Nepomuk] Re: Handling multiple sources of metadata

Thu May 5 22:18:48 CEST 2011

On 05/04/2011 04:52 PM, Bruce Adams wrote:
>> On 05/03/2011 07:24 PM, Bruce Adams wrote:
>>>      That roughly accords with my originally intentions anyway. 
>>> I was  thinking in terms of a standalone tool, library & api 
>>> for managing  simple meta data (just tags)
>>
>> IMHO it does not make sense to start with  tags alone. I think it would
>> be much simpler to only start with literal  properties, i.e. those for
>> which there is no need to store additional  resources.
>> Then the next step would be to also store the additional resources  which
>> gets much more complicated as it also involves garbage collection  when
>> the user removes a property.
>>
> I am coming from a non-nepomuk background with different but overlapping goals.
> I agree about starting with literal properties. 
> I think of tags as being the simplest property possible. You are implying tags 
> are not this simple.

Well, tags are not literal props in Nepomuk but separate resources.
Thus, I referred to literals.

> I think in nepomuk they are associated with an ontology that knows the set of 
> all possible tags.
> This is a more feature rich representation but also a more complex one.
> I guess we'll see how the code grows.
> The 15 basic elements of the dublin  core would be another good set to start 
> with.

Not really as dublin core is not used in Nepomuk. I still think that
simply saving all literal values is a good idea. I see no point in
restricting yourself to a specific set of properties or one ontology.
Just get all literals. This could either be done with a SPARQL query
(FILTER(isliteral(?v)) or by simply listing all properties and ignoring
those that have a resource type:

Nepomuk::Resource res(url);
QHash<QUrl, Nepomuk::Variant> props = res.properties();
for each prop in props:
   if(value.isResource() || value.isResourceList())
      continue;

something like that.

>  
>>> and later growing this to  support integration with nepomuk.
>>> and incorporate other kinds of  metadata.
>>>
>>> I'm happy to hear suggestions.
>>>
>>> There  are two main design choices to consider.
>>>  1. the location of the  metadata
>>>        one per file
>>>         one metadata area per directory
>>>         one per filesystem
>>
>> IMHO there should be one file per file system. The  reason is simple:
>> that way we only need to store additional resources like  the previously
>> mentioned project once. If we had one file per dir or file  than we would
>> have to store (and later merge) these additional resources over  and over.
>>
> Merging is an inevitable requirement when you copy data around.
> You can't necessarily have one file per file system if that file system is 
> multi-user.
> It may be sufficient for a USB flashdrive but not when you have say 
> /home/user1 
> /home/user2

But IMHO this is not a use-case.

>  
> It brings in all the complexity of multi-users and security.
> So I think the best thing is to be flexible.
> So if you have an acessible metadata directory at the root of the filesystem
> you need no other but if not try the next level down
> (or rather start at the file and work up the tree until you find one).

Sure, if it can be designed flexible enough to support both - great. :)

>>> on balance I believe per directory makes most  sense.
>>> Though it is not that much extra complication to say a metadata  area is not
>>> required for a sub-directory of a directory which already  has one and this 
>> would
>>> keep the meta-data layout simple.
>>>
>>> 2. the format of the metadata
>>>   binary or  text
>>>   if text,  trig, turle , trix or something  else.
>>
>> As I already mentioned I would prefer trig since that allows us to  store
>> graph metadata which contains information like "when was the  data
>> created" and "who created the data".
>>
>> One could compress this  file. Sadly there is no pseudo-standard for RDF
>> storage yet as there is for  SQL (sqlite) so using redland seems weird to me.
>>
> I'll look into it
>  
>>>    there  is an advantage to the simplicity of <key>=<value> for just  tags
>>>    but it will not scale well to complex meta  data.
>>>   for binary I would imagine a standard database such as  sqlite.
>>>   The advantage there is compactness.
>>>
>>>  There is nothing to stop either of these being configurable but it is 
>> sensible 
>>
>>> to
>>> start as you mean to go in.
>>>
>>> I think  metadata should live in a .metadata directory except that .metadata 
>> is 
>>
>>>  used by eclipse.
>>> This is something that should be adoptable as part of  the linux filesystem 
>>> hierarchy.
>>> I don't think it should be  .nepomuk as that might alienate gnomes.
>>> If all metadata is rdf .rdf  might be a good choice.
>>
>> I would personally go for .nepomuk for now since  there will be no
>> collaboration with Gnome anyway (well, at least I do not  believe in it
>> after trying for several years. But maybe you would have more  luck ;)
>>
>> Cheers,
>> Sebastian
>>
> That is one reason why I would rather keep things simple and stand-alone.
> Rather than starting from the nepomuk or xesam ontologies and trying to force 
> one on the other.

A certain degree of commitment to the ontologies is required. Otherwise
we loose too much information and the result will not be useful to
Nepomuk. That said, what I mean is the meta-meta-data we use, ie. named
graphs and stuff like modification dates and creators and so on.

> I would rather stay out of that fight for now and offer an additional means of 
> interchanging data. :)

There is no fight really, especially since Tracker (Gnome) does use the
Nepomuk ontologies, too. :)

Cheers,
Sebastian

> 
> Regards,
> 
> Bruce.
> 
>