[kde-edu]: KVTML files for Mandarin Chinese

Jeremy jeremy at scitools.com
Fri Sep 21 16:20:25 CEST 2007


On Thursday 20 September 2007 09:58:23 Kasim Terzic wrote:
> Hi,
>
> I have generated some kvtml files for Mandarin Chinese, after I saw
> that there were none. Please find the following files in the attached
> archive:
>
> hsk1.kvtml - List of characters from the HSK-A set (Basic)
> hsk2.kvtml - List of characters from the HSK-B set (Basic)
> hsk3.kvtml - List of characters from the HSK-C set
> (Elementary/Intermediate) hsk4.kvtml - List of characters from the HSK-D
> set (Advanced)
>
> top500.kvtml - The 500 most common characters, sorted by frequency
> next500.kvtml - The next 500 most common characters
>

Awesome, I've personally been looking for something like this for a while now.  
CEDICT comes close for me, but not quite as nice as this.  Actually, someone 
in #kde-cn did a few kvtml files you might also be interested in.  They are 
in svn at /home/kde/trunk/l10n-kde4/zh_CN/data/kdeedu/kanagram/ .  They were 
created for KAnagram's use, so are longer than a word per entry (One is Tang 
Poem, other 13 are chinese idioms).  

I see your files are simplified characters, mind if I (or you) convert them to 
traditional for zh_TW to also enjoy?  Also, are these appropriate for zh_CN 
and zh_HK locales?  If so I'll add them to both in svn.

> The files are in utf-8 and work best with a Unicode font. They should
> also work well with a good GB font.

Perfect, they appear here just fine (I have chinese fonts installed).

>
> The HSK tables were taken from
> http://www.chinese-forums.com/vocabulary/, which seems to be free and
> is used by online dictionaries all over the web. The HSK is the
> standard Chinese proficiency test required for people wishing to
> work/study in China and a common way to gauge progress.
>
> The frequency tables were taken from
> http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=TO
> (WARNING: large document), which is a university research project and
> in the public domain as far as I can tell.
>
> The translations were taken from the CEDICT project,
> http://www.mandarintools.com/cedict.html, which uses a liberal,
> Creative Commons-like licence, which I included in the tarball.
>
> I have tested the files with KVocTrain 0.8.3.
>
> Please let me know if this is useful for the KDE Edu project and can
> be distributed with other data files. If there is interest, I could
> also generate the vocabulary lists (not just characters) for the
> different HSK levels.

Do you mean adding english and or chinese definitions for each entry?  That 
would also be nice I think.

If you use irc, I'd like to discuss these and other possibilities with you 
sometime.  I'm jpwhiting on freenode most of the time.

Jeremy Whiting


More information about the kde-edu mailing list