[Nepomuk] Re: Handling multiple sources of metadata

Sun May 8 16:01:28 CEST 2011

Hi,

I'm develop a couple of month ago a python script to share and backup
nepomuk data in usb harddisks and is available here:

http://kde-apps.org/content/show.php/Neposidekick+service+menu?content=137233

Sadly I currently have no free time to continue with the develop neither
learning sparql and virtuoso so I stop development but I'm using frequently
to share my tags, comments and score in different systems and users.

I don't know if this would be useful to you but I store all data in a hidden
text file named .nepomuksidekick per directory and I really satisfied with
results. I have two different file formats, one full with all available data
included (probably with some bugs :)) and the other is human readable so its
easy to understand and editable by humans and with limited data support.
Both are csv text files because this format is really easy to read and write
:).

All is done with a subclass of Nepomuk.Resource and two main methods,
toText() and fromText().

If you think that work would be useful to you I can send you the last
version and share knowledge with you.

Bye
Ignacio

On Thu, May 5, 2011 at 10:18 PM, Sebastian Trüg <trueg at kde.org> wrote:

> On 05/04/2011 04:52 PM, Bruce Adams wrote:
> >> On 05/03/2011 07:24 PM, Bruce Adams wrote:
> >>>      That roughly accords with my originally intentions anyway.
> >>> I was  thinking in terms of a standalone tool, library & api
> >>> for managing  simple meta data (just tags)
> >>
> >> IMHO it does not make sense to start with  tags alone. I think it would
> >> be much simpler to only start with literal  properties, i.e. those for
> >> which there is no need to store additional  resources.
> >> Then the next step would be to also store the additional resources
>  which
> >> gets much more complicated as it also involves garbage collection  when
> >> the user removes a property.
> >>
> > I am coming from a non-nepomuk background with different but overlapping
> goals.
> > I agree about starting with literal properties.
> > I think of tags as being the simplest property possible. You are implying
> tags
> > are not this simple.
>
> Well, tags are not literal props in Nepomuk but separate resources.
> Thus, I referred to literals.
>
> > I think in nepomuk they are associated with an ontology that knows the
> set of
> > all possible tags.
> > This is a more feature rich representation but also a more complex one.
> > I guess we'll see how the code grows.
> > The 15 basic elements of the dublin  core would be another good set to
> start
> > with.
>
> Not really as dublin core is not used in Nepomuk. I still think that
> simply saving all literal values is a good idea. I see no point in
> restricting yourself to a specific set of properties or one ontology.
> Just get all literals. This could either be done with a SPARQL query
> (FILTER(isliteral(?v)) or by simply listing all properties and ignoring
> those that have a resource type:
>
> Nepomuk::Resource res(url);
> QHash<QUrl, Nepomuk::Variant> props = res.properties();
> for each prop in props:
>   if(value.isResource() || value.isResourceList())
>      continue;
>
> something like that.
>
> >
> >>> and later growing this to  support integration with nepomuk.
> >>> and incorporate other kinds of  metadata.
> >>>
> >>> I'm happy to hear suggestions.
> >>>
> >>> There  are two main design choices to consider.
> >>>  1. the location of the  metadata
> >>>        one per file
> >>>         one metadata area per directory
> >>>         one per filesystem
> >>
> >> IMHO there should be one file per file system. The  reason is simple:
> >> that way we only need to store additional resources like  the previously
> >> mentioned project once. If we had one file per dir or file  than we
> would
> >> have to store (and later merge) these additional resources over  and
> over.
> >>
> > Merging is an inevitable requirement when you copy data around.
> > You can't necessarily have one file per file system if that file system
> is
> > multi-user.
> > It may be sufficient for a USB flashdrive but not when you have say
> > /home/user1
> > /home/user2
>
> But IMHO this is not a use-case.
>
> >
> > It brings in all the complexity of multi-users and security.
> > So I think the best thing is to be flexible.
> > So if you have an acessible metadata directory at the root of the
> filesystem
> > you need no other but if not try the next level down
> > (or rather start at the file and work up the tree until you find one).
>
> Sure, if it can be designed flexible enough to support both - great. :)
>
> >>> on balance I believe per directory makes most  sense.
> >>> Though it is not that much extra complication to say a metadata  area
> is not
> >>> required for a sub-directory of a directory which already  has one and
> this
> >> would
> >>> keep the meta-data layout simple.
> >>>
> >>> 2. the format of the metadata
> >>>   binary or  text
> >>>   if text,  trig, turle , trix or something  else.
> >>
> >> As I already mentioned I would prefer trig since that allows us to
>  store
> >> graph metadata which contains information like "when was the  data
> >> created" and "who created the data".
> >>
> >> One could compress this  file. Sadly there is no pseudo-standard for RDF
> >> storage yet as there is for  SQL (sqlite) so using redland seems weird
> to me.
> >>
> > I'll look into it
> >
> >>>    there  is an advantage to the simplicity of <key>=<value> for just
>  tags
> >>>    but it will not scale well to complex meta  data.
> >>>   for binary I would imagine a standard database such as  sqlite.
> >>>   The advantage there is compactness.
> >>>
> >>>  There is nothing to stop either of these being configurable but it is
> >> sensible
> >>
> >>> to
> >>> start as you mean to go in.
> >>>
> >>> I think  metadata should live in a .metadata directory except that
> .metadata
> >> is
> >>
> >>>  used by eclipse.
> >>> This is something that should be adoptable as part of  the linux
> filesystem
> >>> hierarchy.
> >>> I don't think it should be  .nepomuk as that might alienate gnomes.
> >>> If all metadata is rdf .rdf  might be a good choice.
> >>
> >> I would personally go for .nepomuk for now since  there will be no
> >> collaboration with Gnome anyway (well, at least I do not  believe in it
> >> after trying for several years. But maybe you would have more  luck ;)
> >>
> >> Cheers,
> >> Sebastian
> >>
> > That is one reason why I would rather keep things simple and stand-alone.
> > Rather than starting from the nepomuk or xesam ontologies and trying to
> force
> > one on the other.
>
> A certain degree of commitment to the ontologies is required. Otherwise
> we loose too much information and the result will not be useful to
> Nepomuk. That said, what I mean is the meta-meta-data we use, ie. named
> graphs and stuff like modification dates and creators and so on.
>
> > I would rather stay out of that fight for now and offer an additional
> means of
> > interchanging data. :)
>
> There is no fight really, especially since Tracker (Gnome) does use the
> Nepomuk ontologies, too. :)
>
> Cheers,
> Sebastian
>
> >
> > Regards,
> >
> > Bruce.
> >
> >
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>

-- 
Cheers,
Ignacio
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20110508/cb819e28/attachment.htm