[Nepomuk] sequences

Sebastian Trueg trueg at kde.org
Thu Sep 16 18:46:01 CEST 2010


It is not recommended to use RDF containers. They cannot properly
queries via SPARQL, support is not guranteed, and their semantics are
very unclear anyway.
Thus, I follow popular opinion in the semantic world and recommend not
to use them.

On 09/14/2010 04:11 PM, Bèrto ëd Sèra wrote:
> We need something more complex than @en, as we store information on the
> script and the orthographic convention as well, so I guess it will have
> to be properties for that. This is necessary to treat languages that mix
> scripts (like Serbian and Japanese) and/or languages that underwent
> mutations, like Turkish (used to scripted in Arabic, now in Latin) or
> that may use different orthographic standards, like Nederlands or
> German. These features are relevant to correctors, that may "fish" in
> the reps to get "proper data" according to a given standard, and even
> alphabetic filters that will allow only certain chars in a string (if
> any app needs this).

If you need to store more information then you need to go the normal RDF
way: define the ontology constructs you need. We will be happy to help
you with that.

> We currently have a thing called "region" that is probably better called
> "context" in nepomuk, so that an end user may decide what contexts are
> relevant for him, plus he can filter by language/script etc, so that
> people host only what they can use. So there would be a metacontext,
> storing languages, scripts etc plus "meanings", where "meaning" is
> simply an attribute (actually a uuid) by which we can group "things that
> mean the same thing", and strings are actually of two types: expression
> vs definition, to obtain a dictionary like structure. 

A context on the data level is actually a graph, ie. the fourth node to
the triple. In Nepomuk we use it to store reification data like the
creation time of the triple and the type of the data (currently we only
make a distinction between ontologies and instance data. Soon we will
add geo-location and creator data which will enrich the semantics even more.
Thus, please think twice before trying to store anything in the
graph/context metadata. In most cases a specific class or property might
make more sense.

> The rest is about being able to define "user templates", which is
> basically the same as we already can do by adding tags in Dolphin. The
> only difference is that we need this tags to be a Class property (where
> class is an element of a taxonomy), so some tags may be free text, other
> may require you choose a "meaning" or a number etc. This is basically
> like wiki templates, only you get automagically by tagging a resource or
> a string in a given category.

that is no problem. We already do that.

> We currently have some 2 million "meanings" in an experimental db, just
> by assembling a couple of wiktionaries and free stuff from FAO, but we
> aim to reach one billion once the storage is stable. So by thye time
> this gets to be a desktop app we will eventually need to enable sections
> of the metalevel, not to force users to dnload one billion of uuids just
> to manage the 100K strings set they want. Most people are going to use
> this as a "brainstorming notepad" to connect concepts, rather than to
> make translations.

Could you maybe elaborate a bit on your project?

> The last candy on the pie is that we would LOVE not to need any
> centralized store, and simply run it as a distributed rep, into which
> people can write and share what they feel to share. This would solve the
> problem of those who have seldom connections (like people traveling in
> far places to study rare languages) as they can basically work offline
> AND would take a large expense chapter off the foundation, since we
> don't need to feed a hosting company... not sure all of this networking
> can be done, but this was the general strategy, the night we had too
> much to dream :))))))

Could you also elaborate on the distributed rep, please. Be aware that
Nepomuk does not provide a distributed store and it is very unlikely
that it will in the near future - simply because creating a distributed
store is very very hard, a lot of work, and requires expertise that we
do not have...

Cheers,
Sebastian

> On 14 September 2010 15:51, Roman Evstifeev <someuniquename at gmail.com
> <mailto:someuniquename at gmail.com>> wrote:
> 
>     oh, something got messed up - this is correct:
> 
>       <doc1> a <xxx:Document> .
>       <doc1> <xxx:hasContents> <listitem1> .
> 
>       <listitem1> a <RDF:List>
>       <listitem2> a <RDF:List>
>       <listitem3> a <RDF:List>
> 
>       <listitem1> <RDF:first> <sen1> .
>       <listitem1> <RDF:rest> <listitem2> .
> 
>      <listitem2> <RDF:first> <sen2> .
>       <listitem2> <RDF:rest> <listitem3> .
> 
>       <listitem3> <RDF:first> <sen3> .
>       <listitem3> <RDF:rest> <rdf:nil> . # end of list
> 
>       <sen1> <xxx:hasText> "foo bar hello"@en .
>       <sen2> <xxx:hasText> "The world is mine"@en .
>       <sen3> <xxx:hasText> "good bye"@en .
> 
> 
>     2010/9/14 Roman Evstifeev <someuniquename at gmail.com
>     <mailto:someuniquename at gmail.com>>:
>     > maybe RDF Collections can be used here to store sequences of
>     resources?
>     >
>     >  <doc1> a <xxx:Document> .
>     >  <doc1> <xxx:hasContents> <listitem1> .
>     >
>     >  <listitem1> a <RDF:List>
>     >  <listitem2> a <RDF:List>
>     >  <listitem3> a <RDF:List>
>     >
>     >  <listitem1> <RDF:first> <sen1> .
>     >  <listitem1> <RDF:rest> <listitem2> .tem2> <RDF:first> <sen2> .
>     >  <listitem2> <RDF:rest> <listitem3> .
>     >
>     >  <listitem3> <RDF:first> <sen3> .
>     >  <listitem3> <RDF:rest> <rdf:nil> . # end of list
>     >
>     >  <sen1> <xxx:hasText> "foo bar hello"@en .
>     >  <sen2> <xxx:hasText> "The world is mine"@en .
>     >  <sen3> <xxx:hasText> "good bye"@en .
>     >
>     >
>     >
>     > 2010/9/14 Sebastian Trüg <trueg at kde.org <mailto:trueg at kde.org>>:
>     >> If I understand correctly you want to store sentences as RDF
>     literals.
>     >> Thus, something along the lines of:
>     >>
>     >> <res> <xxx:hasSentence> "foo bar hello"@en
>     >> <res> <xxx:hasSentence> "The world is mine"@en
>     >> ...
>     >>
>     >> And you want to order them. IMHO this needs to be done by introducing
>     >> the necessary ontology entities. One could think of something like:
>     >>
>     >> <res> <xxx:hasDocument> <doc1> .
>     >> <doc1> a xxx:Document .
>     >> <doc1> <xxx:hasSentence> <sen1> .
>     >> <doc1> <xxx:hasSentence> <sen2> .
>     >> <sen1> <xxx:hasText> "foo bar hello"@en .
>     >> <sen1> <xxx:sentenceIndex> 0 .
>     >> <sen2> <xxx:hasText> "The world is mine"@en .
>     >> <sen2> <xxx:sentenceIndex> 1 .
>     >>
>     >> It is of course complex - maybe someone else can come up with a less
>     >> complicated approach?
>     >> BTW: I am pretty sure someone already wrote an ontology for
>     documents.
>     >> So all we have to do is look for that and make is Nepomukish. :)
>     >>
>     >> Another approach is to store both: the sentences and the full doc.
>     >>
>     >> Cheers,
>     >> Sebastian
>     >>
>     >> On 09/14/2010 10:16 AM, Bèrto ëd Sèra wrote:
>     >>> Hi!
>     >>>
>     >>> I just saw the "Excerpts for Query Results" thing and I really
>     love it.
>     >>> Now there is a last thing we will need for ambaradan: sequences. We
>     >>> store free text in a translatable format, that is, we break it in
>     >>> sentences. So we need to keep an ordered sequence to "rebuild"
>     the doc.
>     >>> Other graphs do not depend on a particular order (apart from
>     hierarchy),
>     >>> as taxonomy should be sorted according to the end user's
>     language. Is is
>     >>> possible/easy to do this? It should be something like a "document"
>     >>> class, that is possibly a graph of chapters, who in turn are an
>     ordered
>     >>> list of sentences.
>     >>>
>     >>> Everything we need else appears to be there already.
>     >>>
>     >>> Bèrto
>     >>>
>     >>> --
>     >>> ==============================
>     >>> Constitution du 24 juin 1793 - Article 35. - Quand le
>     gouvernement viole
>     >>> les droits du peuple, l'insurrection est, pour le peuple et pour
>     chaque
>     >>> portion du peuple, le plus sacré des droits et le plus
>     indispensable des
>     >>> devoirs.
>     >>>
>     >>>
>     >>>
>     >>> _______________________________________________
>     >>> Nepomuk mailing list
>     >>> Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     >>> https://mail.kde.org/mailman/listinfo/nepomuk
>     >> _______________________________________________
>     >> Nepomuk mailing list
>     >> Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     >> https://mail.kde.org/mailman/listinfo/nepomuk
>     >>
>     >
>     >
>     >
>     _______________________________________________
>     Nepomuk mailing list
>     Nepomuk at kde.org <mailto:Nepomuk at kde.org>
>     https://mail.kde.org/mailman/listinfo/nepomuk
> 
> 
> 
> 
> -- 
> ==============================
> Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement viole
> les droits du peuple, l'insurrection est, pour le peuple et pour chaque
> portion du peuple, le plus sacré des droits et le plus indispensable des
> devoirs.
> 
> 
> 
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk


More information about the Nepomuk mailing list