[Nepomuk] Fwd: sequences

Fri Sep 17 11:02:03 CEST 2010

Sorry, I had mistakenly sent this email to Sebastian only...

Hi!

On 16 September 2010 19:46, Sebastian Trueg <trueg at kde.org> wrote:

> It is not recommended to use RDF containers. They cannot properly
> queries via SPARQL, support is not guranteed, and their semantics are
> very unclear anyway.
> Thus, I follow popular opinion in the semantic world and recommend not
> to use them.

 Duly noted. We have an already high risk factor "as is", we won't be
looking for additional trouble.

>
> If you need to store more information then you need to go the normal RDF
> way: define the ontology constructs you need. We will be happy to help
> you with that.
>
Okay, I'll try to use the example I found on the ontologies, so you can
correct my misunderstandings as I go on. I put in the end, as some
explanations may be of help before that.

 Thus, please think twice before trying to store anything in the
> graph/context metadata. In most cases a specific class or property might
> make more sense.
>

Again, whatever does the job better and with less trouble is welcome.

>
> Could you maybe elaborate a bit on your project?
>
It's a simple idea: we collect external data sources, like wiktionary,
AGROVOC, geographic names, public open source DBs etc and put them into a
common format that allows a single interface for them all. In our DSL the
prepared data from a single source is called a "region". We have bots doing
the collect/update job for a region, and since data is semantically tagged,
a user may say "I want everything about Tai-chi, vegan food and astronomy
from the following regions in English and Japanese". His machine downloads
the prepared normalized data from a list of network-sources to keep
up-to-date. While these bot driven regions are read-only the user can also
upload back stuff to the same network-sources using a "community region", in
order to share it. There may be any number of communities, as we expect
people to develop thematic communities or simply to dislike each other POVs,
sooner or later, and communities are self managed. If I don't like a
community I don't link their stuff, and that's it. Anyway, this is a further
development, we will have just one community, to begin with.

> Could you also elaborate on the distributed rep, please. Be aware that
> Nepomuk does not provide a distributed store and it is very unlikely
> that it will in the near future - simply because creating a distributed
> store is very very hard, a lot of work, and requires expertise that we
> do not have...
>

We have no idea about how to make a "real" distributed store, either. The
doable thing I can think of is, as I said, a list of network-sources that
can import and distribute a number of "regions" to end users. But I tend to
think that since we have to upload stuff back to a "community region" two
laptops could sync each other using any network connection in place. So, for
example, if I live in an insulated village in the middle of nothing (a
common situation in the third world) and I have but one box in a school,
anyone coming to visit with a laptop can update me, provided that I told him
what to download when he could get a normal connection and he has storage
space enough for it. Or I could remain in the village and be sent a RAM key
or a DVD with updates on it. This would already be a lot, for most "randomly
connected" situations. Most of Africa and a lot of Asia has little choice
other than this, and it's especially for them that accessing "thematic
knowledge" locally is a high value.

I suppose that since dbpedia has RDF exports there should be RDF imports,
and we could just use this. Once export files are available, they could be
broadcast using the p2p features (which I know nothing of, I just know they
should be there, sooner or later). Since we do multimedia content, this is
especially important to limit the amount of content one wants to store. To
remain with our previous example, my subscription could be: "I want
everything about Tai-chi, vegan food and astronomy from the following
regions in English and Japanese, excluding video and audio files, pictures
included". In any case I would get pointers to remote resorce uuids, telling
me there's a video/audio file (and its tags), so that I can know it's there
and I can order single downloads if I decide some particular material is of
interest.

Now let's get to the model. "Profile" is our DSL lingo for "meaning", see
http://en.wikipedia.org/wiki/Cognitive_semantics#Langacker:_profile_and_base
 :

@PREFIX foo: <http://foo.bar/types#>
foo:Profile rdf:type rdfs:Class .

/* A "Region" has a textual  description, along with other minor properties,
so we want it to inherit all translation capabilities from Profile */
foo:Region rdf:type rdfs: Class.
foo:Region rdfs:subClassOf foo:Profile .

/* This is where actual content is */
foo:Content rdf:type rdfs:Class .
foo:Text rdf:type rdfs:Class .
foo:File rdf:type rdfs:Class .

foo:means rdf:type rdf:Property .
foo:means rdfs:domain foo:Content .
foo:means rdfs:range foo:Profile .

/* Here I'm in trouble, as I need what DBs call an ENUM(expression,
definition) that defines the role a Content instance in a dictionary
expr=def equation. You will excuse my "creative syntax", probably I should
have created a "Role" class with two instances, right? */
foo:hasRole rdf:type rdf:Property .
foo:hasRole rdfs:domain foo:Content .
foo:hasRole rdfs:range foo:(expression,definition) .

/* How do we avoid infinite recursions here? */
foo:isTranslationOf rdf:type rdf:Property .
foo:isTranslationOf rdfs:domain foo:Content .
foo:isTranslationOf rdfs:range foo:Content .

/* Do we have Booleans? Anyway, if an instance of content gets modified, all
of its translations are marked "fuzzy" by this flag */
foo:isVerified rdf:type rdf:Property .
foo:isVerified rdfs:domain foo:Content .
foo:isVerified rdfs:range foo:Boolean .

/* Now these two properties are on a mutex constraint, something is either a
text of a file. Not sure whether this distinction is important for nepomuk,
in PostgreSQL we use it to separate things we can set a full-text search on
from things that must be searched otherwise. Content is also used as a
meta-level, to send out minimal information about remote files that aren't
actually present on the system */
foo:hasText rdf:type rdf:Property .
foo:hasText rdfs:domain foo:Content .
foo:hasText rdfs:range foo:Text .

foo:hasFile rdf:type rdf:Property .
foo:hasFile rdfs:domain foo:Content .
foo:hasFile rdfs:range foo:File .

/* This assigns content to a Region */
foo:belongsTo rdf:type rdf:Property .
foo:belongsTo rdfs:domain foo:Content .
foo:belongsTo rdfs:range foo:Region .

Now, before I write too much garbage syntax, is this readable/usable? There
is much more to come, although I expect changes to be needed, for the
existing model to adapt to this new environment.

Bèrto

-- 
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement viole les
droits du peuple, l'insurrection est, pour le peuple et pour chaque portion
du peuple, le plus sacré des droits et le plus indispensable des devoirs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20100917/ed367c8c/attachment-0001.htm