[Parley-devel] Features in next version of Parley

Fri Sep 12 15:35:46 UTC 2014

---- On Tue, 09 Sep 2014 15:38:11 -0700 Inge Wallin  wrote ---- 

>Hi Andreas, 
> 
>This is beautiful! Did you come up with this yourself or did you read about 
>it anywhere? 
> 

None of this is original theory.

The forgetting curve stuff is from a variety of papers on spaced retrieval 
and memory persistence from before my laptop went kaput.

The choice of what data to collect I followed the international language
testing for EU, China and two others which I think were US college 
English language proficiency and Japanese.

The statistics and estimation theory are also not of my creation.

I just jammed them into one file.

I want to emphasize that this was an exercise to determine what data needed
to be in the new file format so that the student's learning state was
observable.  I checked some of the stability constraints imposed
on the estimator, by different types of student behavior.   

I have half a dozen estimators and two different schedulers.  Before I provide 
more details I want to find language learning papers again and most importantly
work with some real data.  So I plan to drop this until the new data
is in the new file format, when I can do some useful work.

>The description below was quite understandable but there are still a few 
>things that I don't understand. See comments inside. 
> 
>    -Inge 
> 
> 
>On Sunday, August 31, 2014 09:01:36 AM Andreas Xavier wrote: 
>> Hello, 
>> 
>> In this response, I will describe the method in greater detail and then 
>> address your specific questions in the body of your email. 
>> 
>> The method was developed as throw away code to determine what data 
>is needed 
>> in the file format to do useful estimation and scheduling. It does 
>> re-implement some existing Parley functionality. It fixes Leitner's 
>> implicit estimation problem with short training intervals, late training, 
>> and extra training. 
>> 
>> The system is series of stages that are independent and run in 
>sequence. The 
>> stages are the estimator, scheduler, game/method selector, 
>game/method and 
>> training data storage. It is encapsulated well enough that you can swap 
>out 
>> stages, game/methods or grammar. The encapsulation depends on two 
>things: 
>> trainable item and the game/method. 
>> 
>> A trainable item is an word/phrase/picture/etc. from the database that is 
>a 
>> target of some game/method. Trainable items have training data stored. 
>> Native language words and mnemonics are not trainable items. A whole 
>set of 
>> inflections are a trainable item. The individual conjugations might also 
>be 
>> trainable if we have a game/method that did individual conjugations. 
>> Individual gendered words are trainable, but the set of all gender words 
>> probably isn't. The individual active games decide what is trainable. 
>> 
>> Training data is added at each training event and is a list of events. It is 
>> either of the form (trainable-id, type (listen/read/speak/write), datetime, 
>> success/failure), or as a user self assessment (trainable-id, type 
>> (listen/read/speak/write), datetime, self assessment) 
>> 
>> A key part of the encapsulation is the game/method. Each game 
>method object 
>> does 2 things. It runs one test of the user. It also answers the question, 
>> "Can this game/method train this item?" The game/method is the only 
>part 
>> that understands, if a trainable item expands into 6 conjugations, 3 
>> antonyms, a prepositional phrase or something else. The game/method 
>is the 
>> only part that knows what mode of speech this tests and what gui front 
>end 
>> is used (flash card/multiple choice/written word) etc. 
>> All of the other stages just treat trainable items as opaque. 
>> 
>> Here is a more detailed explanation of the pipeline 
>> 
>> Estimator 
>> input: training data 
>> output: list of (trainable item , estimated time constant) 
>> how: 
>> 
>> For each trainable item this estimates the current time constant from 
>the 
>> trainable data. You could plug in the Lietner method here except it 
>unduly 
>> rewards early practice. This method has a roll off for short training 
>> intervals. 
> 
>This is the part that I have most problems with. I assume that the 
>"estimated time constant" is the point in time which is optimal for 
>reinforcement of this particular item. Right? 
> 
>But *how* is this constant calculated? You say that you use the forgetting 
>curve, but it's not at all obvious how this is done. 
> 
>> Scheduler 
>> inputs: list of (trainable item , estimated time constant) 
>> output: list of trainable items to be trained now 
>> how: 
>> 
>> This is simple. It looks are the number of times that the user wants to 
>> train, the number of trainable items pending and any new untrained 
>items. 
>> It makes a list. It could be scheduling individual items in continuous 
>> mode, or a block of items. 
> 
>I assume that the strategy for picking the things to train now is fixed? I.e. 
>the list of trainable items are sorted in some way? But how is this done, 
>i.e. what is the numerical priority calculated from the estimated time 
>constant per iten and the current time? You give some clues below but 
>those are not enough for me to understand the details. 
> 

I have several schedulers that work.  

>> Game Selector 
>> input: list of trainable items 
>> output: list of (game, trainable item) 
>> how: 
>> 
>> For each trainable item this asks all of the active game/methods, "Can 
>you 
>> train this item?". It then chooses a game for that item from the list of 
>> game/methods that can train it. An active game/method is one that is 
>> registered and the student or the lesson has selected it. 
> 
>Shouldn't this also take into account the type of training that the user 
>wants to do (read/listen/speak/write)? 
> 
>In the future Parley could have checkboxes for (at least) those 4 types and 
>we could schedule sessions with, say, written using the written answer 
>widget, reading using the flashcard widget and say using some method 
>from Artikulate. 
> 

Yes, the game selector only works from the set of active games/methods.  
If the user has excluded written practice, it is excluded.
If the lesson plan excludes Artikulate style, then it is also excluded.
It may be that an item needs to be trained, but the are no available training methods
so the item gets dropped and is not trained.

>> Game/Method 
> 
>You keep mentioning game... I suppose you have something in mind 
>here? Do you have any specific gamifications that you are thinking of? 
> 

In one of your online documents you mentioned using mode for training vs testing.
I think referring to the individual flash card/multiple choice/mixed letter etc. 
as games is less ambiguous than method.  Particularly, in an email 
about estimation methods and scheduling methods. Even in general 
I think asking the user "What games do you want to play?" is more
inviting. 

>> input: one trainable item, one user/student, database 
>> output: list of (trainable item, type (listen/read/speak/write), datetime, 
>> success/failure) how: 
>> 
>> The game/method pulls any additional items it needs from the 
>database, 
>> chooses a gui, runs the training with the user and then returns a list of 
>> all of items trained and their result. 
>> 
>> For example a conjugation trainer, might receive one trainable item, (an 
>> infinitive, present tense). It tests 6 present tense conjugations. The 
>> students gets 5 right and 1 wrong, so the game/method returns 7 
>trainable 
>> items: 5 right and 1 wrong conjugation and the original trainable item 
>> marked wrong. 
> 
>Hmm, why return the original marked wrong? 
>

It is an effort to avoid the "waterfall model"  problem.  People remember 
most strongly the first item in a list and/or the item presented the greatest number of times.

The waterfall model was presented in a paper as straw man argument 
of a non-working development model.  Because it was presented first 
and most clearly in the paper, the waterfall model is all anyone remembers 
of that paper.  It was subsequently expanded into a full development model 
that everyone loves to shoot down. 

https://en.wikipedia.org/wiki/Waterfall_model

Another example, closer to our case.  In the following list:

WRONG WAY: this is the thing people will remember most strongly
RIGHT WAY: most people form a stronger memory of the first option in a list

People remember that there were 2 items in the list, one was wrong and one was right,
and they remember the first item in the list most strongly.  They remember the first 
item they read more strongly than they remember if it is wrong or right.

Currently, Parley's exit statistics show people their own wrong answers, highlighted
at the top of the list.  (Note: Except for flashcard.) We are re-enforcing the wrong answers.

I think we could just have three sections in the table with a gap.  Omit the "your answer" section,
or make hidden until click on, so that the first thing they see is the correct answer.

Wrong Anwers
Question Solution
black   **bright color**
black   **bright color**

<gap>
Correct Answers
Question Solution
black **green**
black **green**

<gap>
Not Answered
Question Solution
black black
black black

>> Data Storage 
>> input: list of (trainable item, type (listen/read/speak/write), datetime, 
>> success/failure) output: training data 
>> 
>> how: Save to database. 
> 
>This is going to be massive in some time! We need a real database here. 
>Luckily this work is already started with Amarvir's GSoC work. 
> 
>> > There are some things I don't understand in your description below. 
>For 
>> > instance how you calculate the optimal intervals for an individual? Or 
>do 
>> > you? 
>> The optimal interval is per user, per word and per mode. This version 
>> doesn't calculate the optimal intervals, but I think there will be enough 
>> information in the data to do a better job by plugging in a better 
>> scheduler later. The optimal interval also depends on many things. 
>Whether 
>> the student's goals are to learn quickly no matter how much time is 
>spent 
>> studying, or learn efficiently. How cognizant they are that incorrect 
>> answers reinforced just before you forget them are more effective than 
>> regular correct boring sameness. 
> 
>Haha, I think we have a convincing problem here. :) I myself would be 
>hesitant to use a method where I get lots of answers wrong if I didn't trust 
>it a lot. 
> 
>But more importantly, you still haven't told me how the optimal interval is 
>calculated. 
> 
>> > How is the algorithm affected if you don't train for some days even 
>though 
>> > it is scheduled? 
>> The only stage affected by amount of practice is the scheduler. If the 
>user 
>> doesn't have time to practice all pending words, the scheduler prefers 
>the 
>> longer time constant words because they clear more time in the future 
>> schedule. The result is that if a user consistently under practices they 
>> will first achieve mastery of a small subset from the beginning of the 
>> lessons. Eventually, if they are just understudying by a small amount 
>they 
>> will master the whole file. 
> 
>This is very good. Just the way you would want it. (and incidentally also 
>the way that Parley currently works.) 
> 
>> >How is it handled when you introduce new words in a collection that is 
>> >already trained some time? Etc. 
>> If new words show up in the collection, then the scheduler will ask the 
>> game/method, "Can you train this word?" and proceed from there. 
>> 
>> Cheers Andreas 
>> 
>> 
>> _______________________________________________ 
>> Parley-devel mailing list 
>> Parley-devel at kde.org 
>> https://mail.kde.org/mailman/listinfo/parley-devel 
>