[Nepomuk] Fwd: sequences

Tue Sep 21 11:03:04 CEST 2010

On 09/17/2010 11:02 AM, Bèrto ëd Sèra wrote:
> Sorry, I had mistakenly sent this email to Sebastian only...
> 
> Hi!
> 
> On 16 September 2010 19:46, Sebastian Trueg <trueg at kde.org
> <mailto:trueg at kde.org>> wrote:
> 
>     It is not recommended to use RDF containers. They cannot properly
>     queries via SPARQL, support is not guranteed, and their semantics are
>     very unclear anyway.
>     Thus, I follow popular opinion in the semantic world and recommend not
>     to use them.
> 
>  Duly noted. We have an already high risk factor "as is", we won't be
> looking for additional trouble.
> 
> 
>     If you need to store more information then you need to go the normal RDF
>     way: define the ontology constructs you need. We will be happy to help
>     you with that.
> 
> Okay, I'll try to use the example I found on the ontologies, so you can
> correct my misunderstandings as I go on. I put in the end, as some
> explanations may be of help before that.
> 
>     Thus, please think twice before trying to store anything in the
>     graph/context metadata. In most cases a specific class or property might
>     make more sense.
> 
>  
> Again, whatever does the job better and with less trouble is welcome. 
> 
> 
>     Could you maybe elaborate a bit on your project?
> 
> It's a simple idea: we collect external data sources, like wiktionary,
> AGROVOC, geographic names, public open source DBs etc and put them into
> a common format that allows a single interface for them all. In our DSL
> the prepared data from a single source is called a "region". We have
> bots doing the collect/update job for a region, and since data is
> semantically tagged, a user may say "I want everything about Tai-chi,
> vegan food and astronomy from the following regions in English and
> Japanese". His machine downloads the prepared normalized data from a
> list of network-sources to keep up-to-date. While these bot driven
> regions are read-only the user can also upload back stuff to the same
> network-sources using a "community region", in order to share it. There
> may be any number of communities, as we expect people to develop
> thematic communities or simply to dislike each other POVs, sooner or
> later, and communities are self managed. If I don't like a community I
> don't link their stuff, and that's it. Anyway, this is a further
> development, we will have just one community, to begin with.
>  
> 
>     Could you also elaborate on the distributed rep, please. Be aware that
>     Nepomuk does not provide a distributed store and it is very unlikely
>     that it will in the near future - simply because creating a distributed
>     store is very very hard, a lot of work, and requires expertise that we
>     do not have...
> 
>  
> We have no idea about how to make a "real" distributed store, either.
> The doable thing I can think of is, as I said, a list of network-sources
> that can import and distribute a number of "regions" to end users. But I
> tend to think that since we have to upload stuff back to a "community
> region" two laptops could sync each other using any network connection
> in place. So, for example, if I live in an insulated village in the
> middle of nothing (a common situation in the third world) and I have but
> one box in a school, anyone coming to visit with a laptop can update me,
> provided that I told him what to download when he could get a normal
> connection and he has storage space enough for it. Or I could remain in
> the village and be sent a RAM key or a DVD with updates on it. This
> would already be a lot, for most "randomly connected" situations. Most
> of Africa and a lot of Asia has little choice other than this, and it's
> especially for them that accessing "thematic knowledge" locally is a
> high value.
> 
> I suppose that since dbpedia has RDF exports there should be RDF
> imports, and we could just use this. Once export files are available,
> they could be broadcast using the p2p features (which I know nothing of,
> I just know they should be there, sooner or later). Since we do
> multimedia content, this is especially important to limit the amount of
> content one wants to store. To remain with our previous example, my
> subscription could be: "I want everything about Tai-chi, vegan food and
> astronomy from the following regions in English and Japanese, excluding
> video and audio files, pictures included". In any case I would get
> pointers to remote resorce uuids, telling me there's a video/audio file
> (and its tags), so that I can know it's there and I can order single
> downloads if I decide some particular material is of interest. 
> 
> Now let's get to the model. "Profile" is our DSL lingo for "meaning",
> see http://en.wikipedia.org/wiki/Cognitive_semantics#Langacker:_profile_and_base :
> 
> @PREFIX foo: <http://foo.bar/types#> 
> foo:Profile rdf:type rdfs:Class .
> 
> /* A "Region" has a textual  description, along with other minor
> properties, so we want it to inherit all translation capabilities from
> Profile */
> foo:Region rdf:type rdfs: Class.
> foo:Region rdfs:subClassOf foo:Profile .
> 
> /* This is where actual content is */
> foo:Content rdf:type rdfs:Class . 
> foo:Text rdf:type rdfs:Class .
> foo:File rdf:type rdfs:Class . 

Nepomuk has a file class obviously: nfo:FileDataObject - depending on
your needs you could maybe use that instead.

> foo:means rdf:type rdf:Property . 
> foo:means rdfs:domain foo:Content . 
> foo:means rdfs:range foo:Profile . 
> 
> /* Here I'm in trouble, as I need what DBs call an ENUM(expression,
> definition) that defines the role a Content instance in a dictionary
> expr=def equation. You will excuse my "creative syntax", probably I
> should have created a "Role" class with two instances, right? */
> foo:hasRole rdf:type rdf:Property . 
> foo:hasRole rdfs:domain foo:Content . 
> foo:hasRole rdfs:range foo:(expression,definition) . 

In Nepomuk's contact ontology roles are modeled as a type hierarchy, ie.
instead of using something like foo:hasRole you use rdf:type with the
appropriate role type. Not sure if that applies here as well though.

> /* How do we avoid infinite recursions here? */
> foo:isTranslationOf rdf:type rdf:Property . 
> foo:isTranslationOf rdfs:domain foo:Content . 
> foo:isTranslationOf rdfs:range foo:Content . 

You don't avoid it - at least not on the ontology level. I suppose you
have to do that in client code.

> /* Do we have Booleans? Anyway, if an instance of content gets modified,
> all of its translations are marked "fuzzy" by this flag */
> foo:isVerified rdf:type rdf:Property . 
> foo:isVerified rdfs:domain foo:Content . 
> foo:isVerified rdfs:range foo:Boolean . 

we do have booleans, yes: xsd:boolean

> /* Now these two properties are on a mutex constraint, something is
> either a text of a file. Not sure whether this distinction is important
> for nepomuk, in PostgreSQL we use it to separate things we can set a
> full-text search on from things that must be searched otherwise. Content
> is also used as a meta-level, to send out minimal information about
> remote files that aren't actually present on the system */
> foo:hasText rdf:type rdf:Property . 
> foo:hasText rdfs:domain foo:Content . 
> foo:hasText rdfs:range foo:Text . 

Am I understanding correctly that you need foo:Text since you need to
split the text into paragraphs or something equivalent?

> foo:hasFile rdf:type rdf:Property . 
> foo:hasFile rdfs:domain foo:Content . 
> foo:hasFile rdfs:range foo:File . 

What is the semantics here? If it is simply intended to relate
foo:Content to a file you could stick to a generic relation like
nao:isRelated since the range type is file already. Thus, the property
hasFile would not add any additional semantic knowledge.

> /* This assigns content to a Region */
> foo:belongsTo rdf:type rdf:Property . 
> foo:belongsTo rdfs:domain foo:Content . 
> foo:belongsTo rdfs:range foo:Region . 

Could be modeled via nie:isPartOf.

> Now, before I write too much garbage syntax, is this readable/usable?
> There is much more to come, although I expect changes to be needed, for
> the existing model to adapt to this new environment.

Are my comments helpful or are you looking for something else?

Cheers,
Sebastian