Regarding our language tools: Data

Mon Feb 10 23:05:47 UTC 2014

Adding to second point about a graph like structure (which I think is a
great idea btw), we should implement Personas here, as Andreas mentioned in
the other thread. Using this we could enforce certain words to be filled in
by the user that wishes to add his/her language. This would restrict the
data needed to complete a vocabulary collection for a language, and set a
defined path for the collection adder to follow. Artikulate seems to have
done this and they are working towards having an impressive collection for
multiple languages. And with a defined path set, you dont need people with
experience of training or teaching a language to produce a great, usable
collection. Even our KDE team could do it, wih members across the globe, as
Artikulate benefitted from ;) .

Also by defining certain set of needed words, we could auto-generate a
collection in a language, say Thai, using Wiki or an online translator, and
get a native speaker in our team to validate them and modify. This would
decrease their workload too.

And regarding that issue for to clean and is clean in Thai, this could be
implemented by making the initial, path defining list in English to be
composed of sub lessons, which we already have. But these lessons would be
well defined, eg verbs part 1, adjectives part1. So clean would be present
in multiple forms, as "to clean" (v) and "is clean" (adj).

We could also save up on uploading and downloading images with collections
and instead package them with Parley itself, as they would be used for
every language collection anyway. Its the sound files that need to be saved
with the language vocabulary.
On 10 Feb 2014 23:51, "Inge Wallin" <inge at lysator.liu.se> wrote:

>  On Sunday, February 09, 2014 16:52:58 Inge Wallin wrote:
>
>
>
> > What I will do is to describe my experiences with some online tools,
> what I
>
> > learned from them and discuss how we could apply that to our own
>
> > applications.
>
>
>
> Ok, I am going to continue this thread by discussing some topics in more
> detail. The first mail was en overview of what I think and why and was not
> focussed on specific problems or solutions. This mail and possibly some
> more in this series are going to be.
>
>
>
> This mail will be about our data.
>
>
>
> I have thought a lot about who would use Parley in particular and come to
> the conclusion that the normal school kid is not it. Definitely not in its
> current incarnation but I don't think that s/he would even if we made it
> more focussed on the learning process.
>
>
>
> Instead, I think that people like myself, adults who want to learn a new
> language should be our first target group. We are serious in our studies,
> we want to build a large vocabulary and we have the disciplin to see things
> through. But we want to be supported in our large-scale learning by the
> application, not use it mostly to create a file for today's homework of 15
> words and then study it during one evening.
>
>
>
> I have a lot to say about the interaction design for this too, but I will
> leave that for another mail. Instead I will focus on the data design this
> time.
>
>
>
> Now, the above means that we need to support learning large collections of
> words efficiently. And we need to support creating those too. And
> copy/transfer/download them.
>
>
>
> First of all, I think multimedia is an immensely important part of the
> learning process. I would say it's impossible to learn a word well in an
> unknown language if you can't hear it pronounced by a native speaker. So
> sound files should always be part of a collection. (As a side note, I will
> use the word collection here instead of lesson - a lesson is something you
> have with a teacher; the collection is a ...well... collection of words).
>
>
>
> And we need tools to be able to create collections quickly. But first, let
> us take a step back.
>
>
>
> I have looked at the dtd for kvtml, the current XML based file format for
> our language tools. And I notice that there are some things missing. First
> of all there is no way to describe a language. I think we need a separate
> way of describing each language. They vary a lot, not just in their
> vocabulary but in many other aspects too. For one thing, different
> languages have different word classes. Most Western languages use
> conjugation of verbs. Asian langugages mostly do not. The word classes verb
> and adjective are present in almost all languages but others are not, e.g.
> particles. Some languages use genders for their nouns, others do not. And
> it goes on.
>
>
>
> In this regard, i think it's also important that we be able to describe
> variations of languages. For instance, american and brittish English are
> almost completely the same but the spellings of some words differ
> (color/colour). And in some cases there are different words for the same
> thing (lift/elevator). So you should be able to indicate which variation a
> word belongs to.
>
>
>
> So we need a way to describe a language to make the UI of Parley relevant
> to the language that we study. For instance having a mode to study
> conjugation when I learn Thai does not make sense because Thai doesn't use
> conjugation at all.
>
>
>
> I also think we should separate our vocabularies from our collections for
> efficiency reasons. The word "yellow" has the same pronounciation and would
> use the same same image whether it's part of a collection of colors or if
> it's part of a general collection of the 500 most common words.
>
>
>
> Naturally we should still have a file format that supports all of what we
> support now that supports easy download of a collection complete with
> everything needed for efficient learning. This is the current kvtml format
> and it is good for the end user to download and learn from.
>
>
>
> But we should also have a kind of back-office storage of the full
> vocabulary of our supported languages. This could be kept in a central
> database that could be replicated in full or in part to any user. And the
> user could work on it and upload his or her extensions to it. In the end we
> would have a pretty extensive collection of words in many different
> languages.
>
>
>
> The second part of this central database would be a set of translations.
> If we consider the words as nodes in a graph, then the translations would
> be the edges. Some languages have one-to-one translation of certain words
> to other languages, some don't. For instance, I mentioned in my last mail
> the example of translating "clean" into thai. But it didn't indicate if it
> was the verb "to clean" or if it was the adjective "is clean". In Thai they
> are different words, and also in Swedish.
>
>
>
> Now, if we have the above in place, i.e. vocabularies centrally stored
> (replicable to user's computers), and a set of translations of words in one
> language to another, we should be able to almost autogenerate collections.
> To create a lesson, you would specify a list of words in the target
> language (Thai in my case), say which language you want to go from (e.g.
> Swedish) and say to the database tool: Create a collection of these words,
> complete with images, sound bytes, etc and store to my hard disk. If some
> words don't have translations in the system, it could be forced to do
> transitive translations e.g. swedish-english-thai. Sometimes this would
> give the wrong result, but that could be regarded as a bug which could be
> fixed by just adding a specific translation swedish-thai for that
> particular word.
>
>
>
> These collections could then be stored just like the current ones on GHNS
> server or in a bodega store or anywhere else.
>
>
>
> Another good thing with this approach is that we will probably be able to
> autogenerate some of the vocabulary metadata (sound/images/...) from places
> like WikiMedia and other free resources. And we could create tools similar
> to what our translators use for maintaining the vocabularies and keep them
> up to date and extend them.
>
>
>
> I will stop here and wait for any reactions.
>
>
>
> -Inge
>
>
>
> _______________________________________________
> kde-edu mailing list
> kde-edu at mail.kde.org
> https://mail.kde.org/mailman/listinfo/kde-edu
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-edu/attachments/20140211/b2cc3dae/attachment-0001.html>