[gcompris-devel] The next file format
Emmanuel Charruau
echarruau at gmail.com
Mon Aug 25 12:50:03 UTC 2014
Hi everybody,
I had some thought about this format, I already shared them with Bruno but
it would be interesting to know what you think about them.
The most important for me is that this format has to be directly
writable/readable by any normal teacher.
What I had in mind when I thought that was the wiki language.
Then my thought was that to keep it extremely simple we need to have a
dedicated syntax per activity.
Teachers would not need to learn these languages, simply copy paste
examples taken from a cookbook and adapt them.
Lets take the example we put in our wiki activity proposition page.
Here the activity is a MCQ and the question is "find the opposite."
exercice type: MCQ
Where is the [little|big,red] cat
In [ ] are the answers to propose, the first one is the good one).
You can see that the syntax is extremly easy and can be mastered in a few
seconds by teachers.
Lets take now a format to choose pairs of words. The format could be
exercice type: pairs of choices
[3-Blue] [1-Rouge]
[1-Red] [2-Jaune]
[2-Yellow] [3-Bleu]
Again the format is extremely simple to understand.
The exercice can be directly typed into a web interface and tested on the
flow by the teacher using a web javascript program, and this without the
need to plugin into GCompris.
The questions can be directly copyed into JSON without parsing of any type.
Any comment?
Regards,
Emmanuel
2014-08-24 14:56 GMT+02:00 Inge Wallin <inge at lysator.liu.se>:
> On Wednesday, August 20, 2014 11:23:02 Bruno Coudoin wrote:
> > Hi,
> >
> > On the GCompris side we are also working on defining a new dataset
> > format for the new Qt Quick based version.
> >
> > While we are not specifically addressing language or grammar
> > application, we found the need to define a way to create, distribute,
> > share and play datasets for specific activities.
>
> I think it would be a good thing if we could share at least container
> format and parts of the
> library to access it.
>
> > This may be list of words for a hangman, letters for a typing tutor,
> > images and voices for language learning tools, a text with holes for a
> > reading exercises, ...
>
> In these cases we should definitely share the format!
>
> > As you can see the type of exercises are very different and we cannot
> > end up with a dataset structure common to all of them. Also, an
> > important part of the task is to provide a way for teachers to create
> > datasets, assign them to children and if they want share them.
> >
> > Based on our requirements we ended up with a a different proposal than
> > yours but we are also in the early stage on it, Holger just wrote what
> > we came up with in Randa on our wiki:
> > http://gcompris.net/wiki/Dataset_handling
> >
> > As you can see in our idea we define a 'datatype' which would be common
> > to all and a 'payload' which would be readable only by a given activity
> > and and editor following its mime type. Thus the whole infrastructure we
> > can set up to manage datasets is not specific to a given type of
> exercise.
> >
> > Being a Qt Quick application we selected json as the format of choice as
> > it is more human readable and native.
>
> It seems that JSON has been a favourite also on the pure language
> applications side...
>
> > Also we have not mentioned it in this wiki page but we are already
> > distributing in the new GCompris voice files as Qt qrc files. They are
> > Qt specific but very easy to manage because you can load them
> > dynamically and then access their content through qrc:// url anywhere in
> > Qml. To us, 'qrc' is good candidate for the container of the datasets as
> > it is Qt native.
>
> I read up a little on qrc, and it seems that these files are hard-coded
> resources that are
> part of the source code. A resource compiler, rcc, is then used to create
> C source files that
> are later compiled using the normal C/C++ compiler and becomes part of the
> executable.
>
> This is a good way to collect parts of the application like icons and
> similar. But it is not
> what the discussion about the new file format is about. We are talking
> about external data
> files that can be downloaded or created after the program is already
> installed.
>
> > Some feedback on your proposal, I am confused by the 'confidence level'.
> > If it is a student mark, it may not be desirable to put it in the
> > dataset itself because it make sense to have it on a read only storage
> > area (most distros will do that). On this topic at GCompris we are
> > interested in a teacher specific tool to help them in their daily usage,
> > we starting specifying it there :
> > http://gcompris.net/wiki/Administration_design
>
> Yes, confidence level is not the ideal term but so far we haven't found
> anything better.
> What it is is the level of confidence that the student has for a
> particular word. This tries to
> capture how strongly the word is put into the memory of the student, or
> loosely put how
> long it can be expected to be before they forget it. If you are not
> familiar with the term
> 'spaced repetition training', I urge you to look it up on Wikipedia, they
> have an excellent
> article about it.
>
> This used to be known as 'grade' in Parley but we are providing a tool for
> learning and
> training, not for testing so grade is not applicable. Besides, grades also
> have a negative
> connotation in that you are a bad person if you have a bad grade. Since
> any low
> confidence level is a necessary step to the higher confidence levels we
> wanted to get rid
> of the grade connotations and that was the best we could come up with. I
> guess 'mark' is
> vaguely similar to grade in this case.
>
> Would you be interested in sharing the container format with us if we can
> agree on how we
> store the internal data?
>
> -Inge
>
>
> > Bruno.
> >
> > Le 17/08/2014 12:46, Inge Wallin a écrit :
> > > Hey there,
> > >
> > > I talked a little with Andreas Xavier the other day about the new file
> > > format, and now with 4.14 tagged we thought it would be a good time to
> > > start discussing that.
> > >
> > > With this mail I will try to establish a common base that I think we
> can
> > > all agree about and with that out of the way we can start to argue
> about
> > > the details. I got a suggestion from Andreas with a very ambitious xsl
> > > definition but I think that most of what he suggested is for the next
> > > level of discussions.
> > >
> > > KVTML
> > >
> > > ---------
> > >
> > > First a short recapitulation about kvtml, our current file format. It's
> > > XML based and has a number of sections represented by the following
> tags:
> > >
> > > - <information>: general info such as author, title, etc
> > >
> > > - <identifiers>: Specification of the languages, including tenses,
> > > articles, word classes, etc
> > >
> > > - <entries>: this is a list of entries, where each entry is a list of
> > > translations, which normally is a word with possibly extra data such as
> > > attached image, sound, etc
> > >
> > > - <lessons>: This is what the user normally sees. Each lesson is more
> or
> > > less a list of translations with a title.
> > >
> > > - <wordtypes>: This is a list of what is normally called word class in
> > > linguistics
> > >
> > > Each identifier (language), entry, translation (=word inside an entry)
> > > has an id. The translations refer to the identifiers (languages) using
> > > the id and the lessons refer to the words by using the id of the
> entries.
> > >
> > > Note that this is the file format itself. Applications such as Parley
> > > add an extra dimension to it by letting the user select languages to
> > > practice but that is not reflected in the file format.
> > >
> > > One other notable thing is that each translation (word) has a
> confidence
> > > level (known as "grade" in the file) attached to it. This is a
> numerical
> > > value between 1 and 7 of the confidence that the student has reached in
> > > recognizing that particular word. This means that every word can only
> > > have one confidence level attached to it which is one of the big
> > > problems with kvtml. More about that below.
> > >
> > > New file format
> > >
> > > ----------------------
> > >
> > > The new format needs to address a number of shortcomings in kvtml:
> > >
> > > - pictures and audio are not contained inside it but are referenced as
> > > outside files. This makes it difficult to store lessons on a server,
> > > e.g. GHNS, and also to download them
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds. Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> gcompris-devel mailing list
> gcompris-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gcompris-devel
>
More information about the Gcompris-devel
mailing list