[Nepomuk] Re: Handling multiple sources of metadata
Sebastian Trüg
trueg at kde.org
Thu May 5 22:18:48 CEST 2011
On 05/04/2011 04:52 PM, Bruce Adams wrote:
>> On 05/03/2011 07:24 PM, Bruce Adams wrote:
>>> That roughly accords with my originally intentions anyway.
>>> I was thinking in terms of a standalone tool, library & api
>>> for managing simple meta data (just tags)
>>
>> IMHO it does not make sense to start with tags alone. I think it would
>> be much simpler to only start with literal properties, i.e. those for
>> which there is no need to store additional resources.
>> Then the next step would be to also store the additional resources which
>> gets much more complicated as it also involves garbage collection when
>> the user removes a property.
>>
> I am coming from a non-nepomuk background with different but overlapping goals.
> I agree about starting with literal properties.
> I think of tags as being the simplest property possible. You are implying tags
> are not this simple.
Well, tags are not literal props in Nepomuk but separate resources.
Thus, I referred to literals.
> I think in nepomuk they are associated with an ontology that knows the set of
> all possible tags.
> This is a more feature rich representation but also a more complex one.
> I guess we'll see how the code grows.
> The 15 basic elements of the dublin core would be another good set to start
> with.
Not really as dublin core is not used in Nepomuk. I still think that
simply saving all literal values is a good idea. I see no point in
restricting yourself to a specific set of properties or one ontology.
Just get all literals. This could either be done with a SPARQL query
(FILTER(isliteral(?v)) or by simply listing all properties and ignoring
those that have a resource type:
Nepomuk::Resource res(url);
QHash<QUrl, Nepomuk::Variant> props = res.properties();
for each prop in props:
if(value.isResource() || value.isResourceList())
continue;
something like that.
>
>>> and later growing this to support integration with nepomuk.
>>> and incorporate other kinds of metadata.
>>>
>>> I'm happy to hear suggestions.
>>>
>>> There are two main design choices to consider.
>>> 1. the location of the metadata
>>> one per file
>>> one metadata area per directory
>>> one per filesystem
>>
>> IMHO there should be one file per file system. The reason is simple:
>> that way we only need to store additional resources like the previously
>> mentioned project once. If we had one file per dir or file than we would
>> have to store (and later merge) these additional resources over and over.
>>
> Merging is an inevitable requirement when you copy data around.
> You can't necessarily have one file per file system if that file system is
> multi-user.
> It may be sufficient for a USB flashdrive but not when you have say
> /home/user1
> /home/user2
But IMHO this is not a use-case.
>
> It brings in all the complexity of multi-users and security.
> So I think the best thing is to be flexible.
> So if you have an acessible metadata directory at the root of the filesystem
> you need no other but if not try the next level down
> (or rather start at the file and work up the tree until you find one).
Sure, if it can be designed flexible enough to support both - great. :)
>>> on balance I believe per directory makes most sense.
>>> Though it is not that much extra complication to say a metadata area is not
>>> required for a sub-directory of a directory which already has one and this
>> would
>>> keep the meta-data layout simple.
>>>
>>> 2. the format of the metadata
>>> binary or text
>>> if text, trig, turle , trix or something else.
>>
>> As I already mentioned I would prefer trig since that allows us to store
>> graph metadata which contains information like "when was the data
>> created" and "who created the data".
>>
>> One could compress this file. Sadly there is no pseudo-standard for RDF
>> storage yet as there is for SQL (sqlite) so using redland seems weird to me.
>>
> I'll look into it
>
>>> there is an advantage to the simplicity of <key>=<value> for just tags
>>> but it will not scale well to complex meta data.
>>> for binary I would imagine a standard database such as sqlite.
>>> The advantage there is compactness.
>>>
>>> There is nothing to stop either of these being configurable but it is
>> sensible
>>
>>> to
>>> start as you mean to go in.
>>>
>>> I think metadata should live in a .metadata directory except that .metadata
>> is
>>
>>> used by eclipse.
>>> This is something that should be adoptable as part of the linux filesystem
>>> hierarchy.
>>> I don't think it should be .nepomuk as that might alienate gnomes.
>>> If all metadata is rdf .rdf might be a good choice.
>>
>> I would personally go for .nepomuk for now since there will be no
>> collaboration with Gnome anyway (well, at least I do not believe in it
>> after trying for several years. But maybe you would have more luck ;)
>>
>> Cheers,
>> Sebastian
>>
> That is one reason why I would rather keep things simple and stand-alone.
> Rather than starting from the nepomuk or xesam ontologies and trying to force
> one on the other.
A certain degree of commitment to the ontologies is required. Otherwise
we loose too much information and the result will not be useful to
Nepomuk. That said, what I mean is the meta-meta-data we use, ie. named
graphs and stuff like modification dates and creators and so on.
> I would rather stay out of that fight for now and offer an additional means of
> interchanging data. :)
There is no fight really, especially since Tracker (Gnome) does use the
Nepomuk ontologies, too. :)
Cheers,
Sebastian
>
> Regards,
>
> Bruce.
>
>
More information about the Nepomuk
mailing list