The next file format
cordlandwehr at kde.org
Sun Aug 17 21:06:28 UTC 2014
Hi, to make the mail not longer that is has to be, find my comments inside
> 1. It should be a container format that can contain every aspect of
> collection inside it. The container itself should be ZIP.
Recently, I often hear that XZ has much better compression rates the GZIP.
But I am fine with any of them.
> 2. Words and lessons should be separated from the training data inside
> 3. We should still base the files inside the container on XML -
> except the multimedia attachments.
XML is sooo 90s ;) (but no objection/bike-shedding from me)
> Now, here are some suggestions that I don't think are very controversial.
> we can get past this quickly, we can start in on the details as soon as
> 1. The new format should copy some of the details from the Open
> Format. This is a good format that works well and for which there are
> nice tools already. The ebook format EPUB also uses the same
> a large degree. Specifically: 1.1 The first file inside it should be called
> 'mimetype' and contain the mimetype for the file. 1.2 There should be a
> manifest file which lists the type and name of all the files inside the
> container. ODF uses META-INF/manifest.xml which works for me.
> 1.3 multimedia files (pictures, video, audio, ...) are put in the container
> and referred to using <xlink> tags. There *could* also be links to
> files but that should be avoided. 1.3.1 There is no mandatory place to
> the attachments but Pictures/, Video/ and Audio/ are preferred paths.
> 1.4 There is a file for metadata called meta.xml.
> 1.5 There is a file for user settings called settings.xml (is this
> necessary?) 1.6 There is a thumbnail file which can be shown in e.g. a
> browser called Thumbnails/thumbnail.png (is this necessary?)
> 2. I suggest that we name the main file collection.xml and the training
> status training.xml.
> 3. Everything inside the collection.xml file should have an id property
> which is a numerical number that should form a consecutive series.
> numbers are only unique within their domain (e.g. words and identifiers
> both use id's 0 and up). This means that attachments for a word, e.g. a
> picture, does also have an id, which is not the case now.
Fine with me, except the use of IDs. There we should either use UUIDs or
identifier strings "org.kde.edu.$COLLECTION.$UUID". That is since:
* it would allow "updates" of a course (in the meaning, of update the
structure from a new version by preserving the training data)
* this upgrade mechanism could also be used so have system-wide install
courses (which are only readable) and from that the user's courses are
updated (class-room situation)
* it allows for collaborative work on a course, as we do it in Artikulate
* files that are associated with an entry should then also be prefixed
according to the ID.
> 4. confidence levels inside the training.xml files always refer to *pairs*
> of items. Examples: translation from a word to another word, translation
> from an audio file to a written word. These entities can be uniquely
> identified by the tree of id's (e.g. entry 4, translation 2, attachment 2
> for the audio file for the the 2nd translation of the 4th entry). See below
> for a question about training types.
If I understand this correctly, you suppose to have essentially a general
purpose database that stores triplets. (Which sounds absolutely fine for
me.) The only thing I wonder, why should that be done in XML and not e.g.
with a small embedded sqlite database (or similar.)
One more point, which I did not find here yet, are the language
specifications. In my opinion that is data that has to be shipped with the
application itself (or made available for download by some online-service
on demand). But in fact, it should not belong to the lesson file.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the kde-edu