[Nepomuk] sequences

Bèrto ëd Sèra berto.d.sera at gmail.com
Tue Sep 14 16:11:50 CEST 2010


Yes, this would do, thanks. We are going to use it to store legal docs as
well, as there is a lot of official material we can get from the EU, to
start with. So a proper sequence here is relevant, as we present a single
listitem for translation, but also offer it in the context of the full
document for a better understanding of the original context.

We need something more complex than @en, as we store information on the
script and the orthographic convention as well, so I guess it will have to
be properties for that. This is necessary to treat languages that mix
scripts (like Serbian and Japanese) and/or languages that underwent
mutations, like Turkish (used to scripted in Arabic, now in Latin) or that
may use different orthographic standards, like Nederlands or German. These
features are relevant to correctors, that may "fish" in the reps to get
"proper data" according to a given standard, and even alphabetic filters
that will allow only certain chars in a string (if any app needs this).

We currently have a thing called "region" that is probably better called
"context" in nepomuk, so that an end user may decide what contexts are
relevant for him, plus he can filter by language/script etc, so that people
host only what they can use. So there would be a metacontext, storing
languages, scripts etc plus "meanings", where "meaning" is simply an
attribute (actually a uuid) by which we can group "things that mean the same
thing", and strings are actually of two types: expression vs definition, to
obtain a dictionary like structure.

The rest is about being able to define "user templates", which is basically
the same as we already can do by adding tags in Dolphin. The only difference
is that we need this tags to be a Class property (where class is an element
of a taxonomy), so some tags may be free text, other may require you choose
a "meaning" or a number etc. This is basically like wiki templates, only you
get automagically by tagging a resource or a string in a given category.

We currently have some 2 million "meanings" in an experimental db, just by
assembling a couple of wiktionaries and free stuff from FAO, but we aim to
reach one billion once the storage is stable. So by thye time this gets to
be a desktop app we will eventually need to enable sections of the
metalevel, not to force users to dnload one billion of uuids just to manage
the 100K strings set they want. Most people are going to use this as a
"brainstorming notepad" to connect concepts, rather than to make
translations.

The last candy on the pie is that we would LOVE not to need any centralized
store, and simply run it as a distributed rep, into which people can write
and share what they feel to share. This would solve the problem of those who
have seldom connections (like people traveling in far places to study rare
languages) as they can basically work offline AND would take a large expense
chapter off the foundation, since we don't need to feed a hosting company...
not sure all of this networking can be done, but this was the general
strategy, the night we had too much to dream :))))))

Bèrto

On 14 September 2010 15:51, Roman Evstifeev <someuniquename at gmail.com>wrote:

> oh, something got messed up - this is correct:
>
>   <doc1> a <xxx:Document> .
>   <doc1> <xxx:hasContents> <listitem1> .
>
>   <listitem1> a <RDF:List>
>   <listitem2> a <RDF:List>
>   <listitem3> a <RDF:List>
>
>   <listitem1> <RDF:first> <sen1> .
>   <listitem1> <RDF:rest> <listitem2> .
>
>   <listitem2> <RDF:first> <sen2> .
>   <listitem2> <RDF:rest> <listitem3> .
>
>   <listitem3> <RDF:first> <sen3> .
>   <listitem3> <RDF:rest> <rdf:nil> . # end of list
>
>   <sen1> <xxx:hasText> "foo bar hello"@en .
>   <sen2> <xxx:hasText> "The world is mine"@en .
>   <sen3> <xxx:hasText> "good bye"@en .
>
>
> 2010/9/14 Roman Evstifeev <someuniquename at gmail.com>:
> > maybe RDF Collections can be used here to store sequences of resources?
> >
> >  <doc1> a <xxx:Document> .
> >  <doc1> <xxx:hasContents> <listitem1> .
> >
> >  <listitem1> a <RDF:List>
> >  <listitem2> a <RDF:List>
> >  <listitem3> a <RDF:List>
> >
> >  <listitem1> <RDF:first> <sen1> .
> >  <listitem1> <RDF:rest> <listitem2> .tem2> <RDF:first> <sen2> .
> >  <listitem2> <RDF:rest> <listitem3> .
> >
> >  <listitem3> <RDF:first> <sen3> .
> >  <listitem3> <RDF:rest> <rdf:nil> . # end of list
> >
> >  <sen1> <xxx:hasText> "foo bar hello"@en .
> >  <sen2> <xxx:hasText> "The world is mine"@en .
> >  <sen3> <xxx:hasText> "good bye"@en .
> >
> >
> >
> > 2010/9/14 Sebastian Trüg <trueg at kde.org>:
> >> If I understand correctly you want to store sentences as RDF literals.
> >> Thus, something along the lines of:
> >>
> >> <res> <xxx:hasSentence> "foo bar hello"@en
> >> <res> <xxx:hasSentence> "The world is mine"@en
> >> ...
> >>
> >> And you want to order them. IMHO this needs to be done by introducing
> >> the necessary ontology entities. One could think of something like:
> >>
> >> <res> <xxx:hasDocument> <doc1> .
> >> <doc1> a xxx:Document .
> >> <doc1> <xxx:hasSentence> <sen1> .
> >> <doc1> <xxx:hasSentence> <sen2> .
> >> <sen1> <xxx:hasText> "foo bar hello"@en .
> >> <sen1> <xxx:sentenceIndex> 0 .
> >> <sen2> <xxx:hasText> "The world is mine"@en .
> >> <sen2> <xxx:sentenceIndex> 1 .
> >>
> >> It is of course complex - maybe someone else can come up with a less
> >> complicated approach?
> >> BTW: I am pretty sure someone already wrote an ontology for documents.
> >> So all we have to do is look for that and make is Nepomukish. :)
> >>
> >> Another approach is to store both: the sentences and the full doc.
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> On 09/14/2010 10:16 AM, Bèrto ëd Sèra wrote:
> >>> Hi!
> >>>
> >>> I just saw the "Excerpts for Query Results" thing and I really love it.
> >>> Now there is a last thing we will need for ambaradan: sequences. We
> >>> store free text in a translatable format, that is, we break it in
> >>> sentences. So we need to keep an ordered sequence to "rebuild" the doc.
> >>> Other graphs do not depend on a particular order (apart from
> hierarchy),
> >>> as taxonomy should be sorted according to the end user's language. Is
> is
> >>> possible/easy to do this? It should be something like a "document"
> >>> class, that is possibly a graph of chapters, who in turn are an ordered
> >>> list of sentences.
> >>>
> >>> Everything we need else appears to be there already.
> >>>
> >>> Bèrto
> >>>
> >>> --
> >>> ==============================
> >>> Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement
> viole
> >>> les droits du peuple, l'insurrection est, pour le peuple et pour chaque
> >>> portion du peuple, le plus sacré des droits et le plus indispensable
> des
> >>> devoirs.
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Nepomuk mailing list
> >>> Nepomuk at kde.org
> >>> https://mail.kde.org/mailman/listinfo/nepomuk
> >> _______________________________________________
> >> Nepomuk mailing list
> >> Nepomuk at kde.org
> >> https://mail.kde.org/mailman/listinfo/nepomuk
> >>
> >
> >
> >
> _______________________________________________
> Nepomuk mailing list
> Nepomuk at kde.org
> https://mail.kde.org/mailman/listinfo/nepomuk
>



-- 
==============================
Constitution du 24 juin 1793 - Article 35. - Quand le gouvernement viole les
droits du peuple, l'insurrection est, pour le peuple et pour chaque portion
du peuple, le plus sacré des droits et le plus indispensable des devoirs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.kde.org/pipermail/nepomuk/attachments/20100914/f3e74452/attachment-0001.htm 


More information about the Nepomuk mailing list