The next file format

Tue Aug 26 19:51:59 UTC 2014

Hello Bruno,

I am working on the coding of our new common file-handling library.  
I have read the two websites that you referenced and I will comment 
on your email below.

I browsed some of the code at https://git-next.kde.org/kde/gcompris.
There are many activities (~100), congratulations. They are ported to 
QT5, double congratulations.  I looked at 3, penalty, missing letter 
and clockgame to try and understand your requirements. 

It looks like gcompris is looking for a common method to store 
semantically disparate resources to provide a uniform interface to 
the activities resources and common distribution.  Judging only 
from the resources that are designated qrc:/location, you will be 
storing activity backgrounds and source code files etc.  If I have 
misunderstood what you want to put in the data repository, then 
some of my concerns below are inappropriate.

I think we are trying to do something slightly different.  
We are trying to store information that has semantic meaning 
common to all the applications.  We are not trying to 
store application specific information like backgrounds, cursors etc. 
We expect the information to be re-usable or of interest to more than
one application.

I do think that we are overlooking some of our own application 
specific differences particularly in the definition of the courses with 
lessons/units.  Perhaps a method to designate application specific 
information, that is blackbox, handled by a application provided
editor and otherwise ignored is a solution.  

>From the terminology that you use on the data handling page, 
"Dataset editors are not forcibly only activity-specific." I think that you
are well aware of these issues.

Anyway, if we proceed to merge these it would be helpful if you could
pick out an application to use as a target.  I was planning on using
KAnagram, Artikulate, Parley and Parley's editor as targets of increasing
feature richness.  Ideally, a good target would be a superset of the
features gcompris expects from the new library. 

>This may be list of words for a hangman, letters for a typing tutor, 
>images and voices for language learning tools, a text with holes for a 
>reading exercises, ... 
> 
>As you can see the type of exercises are very different and we cannot 
>end up with a dataset structure common to all of them. Also, an 
>important part of the task is to provide a way for teachers to create 
>datasets, assign them to children and if they want share them. 
> 
>Based on our requirements we ended up with a a different proposal than 
>yours but we are also in the early stage on it, Holger just wrote what 
>we came up with in Randa on our wiki: 
>http://gcompris.net/wiki/Dataset_handling 
> 
>As you can see in our idea we define a 'datatype' which would be common 
>to all and a 'payload' which would be readable only by a given activity 
>and and editor following its mime type. Thus the whole infrastructure we 
>can set up to manage datasets is not specific to a given type of exercise. 

I have a concern here, that I will gently raise.  

As you pointed out, some data types have natural semantics, which makes 
generalizing them into a type that can be re-used by many applications easy:  
Alphabets, words, grammar, spoken words, sets of things.  My question is
if mixing application specific information with more general semantically
useful information is what people want.  I think this is also CoLa's concern
with my desire to include vocabulary structure (i.e grammar) in the file format.

>Being a Qt Quick application we selected json as the format of choice as 
>it is more human readable and native. 

I agree that human readability and editability should be a primary motivation of this file format.

I think YAML is a better choice.  To non-programmers balancing braces {}, or <tags/> is 
equally difficult.  If someone copy-pastes a section and misses a brace, then it stops working
and the missing brace won't stand out to a non-programmer.  I think people are more likely to 
notice mismatched indentation and make it like the section they copied it from.

If you are directly interpreting the javascript from the JSON in the QTQuick applications, then 
YAML supports embedding JSON as one of its primary goals.  Supporting JSON payloads
for some datasets in an otherwise YAML file would be a more readable alternative.

>Also we have not mentioned it in this wiki page but we are already 
>distributing in the new GCompris voice files as Qt qrc files. They are 
>Qt specific but very easy to manage because you can load them 
>dynamically and then access their content through qrc:// url anywhere in 
>Qml. To us, 'qrc' is good candidate for the container of the datasets as 
>it is Qt native. 

Qrc works well. If the data is intended to be re-used by multiple applications 
it needs to be external to the application, perhaps in the zip.

>Some feedback on your proposal, I am confused by the 'confidence level'. 
>If it is a student mark, it may not be desirable to put it in the 
>dataset itself because it make sense to have it on a read only storage 
>area (most distros will do that). On this topic at GCompris we are 
>interested in a teacher specific tool to help them in their daily usage, 
>we starting specifying it there : 
>http://gcompris.net/wiki/Administration_design 
> 

I think Inge explained this elsewhere but I will elaborate.  We plan to overlay
files to allow vocabulary building, lesson planning and training to be 
separate stages. A single user might most conveniently use a monolithic file for all stages.
But in other contexts a student might reference read-only files for different sections
of the data. For example this overlay stack:

(Words and Grammer) Read - Only, system file
(Course Plan) Read - Only,  different source, perhaps teacher editable
(Student Goals and Training Data)  Editable per user.

Cheers Andreas