[kde-edu]: KVTML files for Mandarin Chinese
Jeremy
jeremy at scitools.com
Fri Sep 21 16:20:25 CEST 2007
On Thursday 20 September 2007 09:58:23 Kasim Terzic wrote:
> Hi,
>
> I have generated some kvtml files for Mandarin Chinese, after I saw
> that there were none. Please find the following files in the attached
> archive:
>
> hsk1.kvtml - List of characters from the HSK-A set (Basic)
> hsk2.kvtml - List of characters from the HSK-B set (Basic)
> hsk3.kvtml - List of characters from the HSK-C set
> (Elementary/Intermediate) hsk4.kvtml - List of characters from the HSK-D
> set (Advanced)
>
> top500.kvtml - The 500 most common characters, sorted by frequency
> next500.kvtml - The next 500 most common characters
>
Awesome, I've personally been looking for something like this for a while now.
CEDICT comes close for me, but not quite as nice as this. Actually, someone
in #kde-cn did a few kvtml files you might also be interested in. They are
in svn at /home/kde/trunk/l10n-kde4/zh_CN/data/kdeedu/kanagram/ . They were
created for KAnagram's use, so are longer than a word per entry (One is Tang
Poem, other 13 are chinese idioms).
I see your files are simplified characters, mind if I (or you) convert them to
traditional for zh_TW to also enjoy? Also, are these appropriate for zh_CN
and zh_HK locales? If so I'll add them to both in svn.
> The files are in utf-8 and work best with a Unicode font. They should
> also work well with a good GB font.
Perfect, they appear here just fine (I have chinese fonts installed).
>
> The HSK tables were taken from
> http://www.chinese-forums.com/vocabulary/, which seems to be free and
> is used by online dictionaries all over the web. The HSK is the
> standard Chinese proficiency test required for people wishing to
> work/study in China and a common way to gauge progress.
>
> The frequency tables were taken from
> http://lingua.mtsu.edu/chinese-computing/statistics/char/list.php?Which=TO
> (WARNING: large document), which is a university research project and
> in the public domain as far as I can tell.
>
> The translations were taken from the CEDICT project,
> http://www.mandarintools.com/cedict.html, which uses a liberal,
> Creative Commons-like licence, which I included in the tarball.
>
> I have tested the files with KVocTrain 0.8.3.
>
> Please let me know if this is useful for the KDE Edu project and can
> be distributed with other data files. If there is interest, I could
> also generate the vocabulary lists (not just characters) for the
> different HSK levels.
Do you mean adding english and or chinese definitions for each entry? That
would also be nice I think.
If you use irc, I'd like to discuss these and other possibilities with you
sometime. I'm jpwhiting on freenode most of the time.
Jeremy Whiting
More information about the kde-edu
mailing list