Review Request 114717: Language detection in Sonnet

Christoph Feck christoph at maxiom.de
Sun Dec 29 17:46:33 GMT 2013



> On Dec. 29, 2013, 4:39 p.m., Àlex Fiestas wrote:
> > I wonder if we could use https://code.google.com/p/chromium-compact-language-detector/, apparently it is known to be really small, fast and contained, what do you think?

It probably has better detection (uses quadgraphs instead of trigraphs), and covers more languages, but it hardly looks "compact", with the cld2_generated_quad0720.cc file being over 20 megabytes large :)


- Christoph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/114717/#review46389
-----------------------------------------------------------


On Dec. 29, 2013, 4:49 a.m., Martin Tobias Holmedahl Sandsmark wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/114717/
> -----------------------------------------------------------
> 
> (Updated Dec. 29, 2013, 4:49 a.m.)
> 
> 
> Review request for kdelibs and KDEPIM.
> 
> 
> Repository: sonnet
> 
> 
> Description
> -------
> 
> I started by merging in the old language detection branch from SVN, while improving it as I went along. One improvement was to use QChar's unicode information instead of shipping our own unicode code point information tables. The old filter class also got replaced with a new tokenizer, which I rewrote most of to simplify.
> 
> I added kdepim to the reviewers because I remember talking with someone working on PIM stuff on IRC, and he was interested in this (a long time ago, though).
> 
> 
> Diffs
> -----
> 
>   data/trigrams/ja PRE-CREATION 
>   data/trigrams/kk PRE-CREATION 
>   data/trigrams/ko PRE-CREATION 
>   data/trigrams/ky PRE-CREATION 
>   data/trigrams/la PRE-CREATION 
>   data/trigrams/lt PRE-CREATION 
>   data/trigrams/lv PRE-CREATION 
>   data/trigrams/mk PRE-CREATION 
>   data/trigrams/mn PRE-CREATION 
>   data/trigrams/nb PRE-CREATION 
>   data/trigrams/ne PRE-CREATION 
>   data/trigrams/nl PRE-CREATION 
>   data/trigrams/nr PRE-CREATION 
>   data/trigrams/pl PRE-CREATION 
>   data/trigrams/ps PRE-CREATION 
>   data/trigrams/pt PRE-CREATION 
>   data/trigrams/pt_BR PRE-CREATION 
>   data/trigrams/pt_PT PRE-CREATION 
>   data/trigrams/ro PRE-CREATION 
>   data/trigrams/ru PRE-CREATION 
>   data/trigrams/sk PRE-CREATION 
>   data/trigrams/sl PRE-CREATION 
>   data/trigrams/so PRE-CREATION 
>   data/trigrams/sq PRE-CREATION 
>   data/trigrams/sr PRE-CREATION 
>   data/trigrams/ss PRE-CREATION 
>   data/trigrams/st PRE-CREATION 
>   data/trigrams/sv PRE-CREATION 
>   data/trigrams/sw PRE-CREATION 
>   data/trigrams/th PRE-CREATION 
>   data/trigrams/tl PRE-CREATION 
>   data/trigrams/tn PRE-CREATION 
>   data/trigrams/tr PRE-CREATION 
>   data/trigrams/ts PRE-CREATION 
>   data/trigrams/uk PRE-CREATION 
>   data/trigrams/ur PRE-CREATION 
>   data/trigrams/uz PRE-CREATION 
>   data/trigrams/ve PRE-CREATION 
>   data/trigrams/vi PRE-CREATION 
>   data/trigrams/xh PRE-CREATION 
>   data/trigrams/zu PRE-CREATION 
>   sonnet.yaml c54f87b 
>   src/CMakeLists.txt e79492f 
>   src/core/CMakeLists.txt 2f8a184 
>   src/core/backgroundchecker.cpp 8b9e983 
>   src/core/backgroundchecker_p.h PRE-CREATION 
>   src/core/backgroundengine.cpp 3a14d34 
>   src/core/backgroundengine_p.h 10f6a27 
>   src/core/client_p.h bd3e416 
>   src/core/filter.cpp e99d332 
>   src/core/filter_p.h 6c7d8c9 
>   src/core/globals.h 0c54c96 
>   src/core/globals.cpp e57450f 
>   src/core/guesslanguage.h PRE-CREATION 
>   src/core/guesslanguage.cpp PRE-CREATION 
>   src/core/languagefilter.cpp PRE-CREATION 
>   src/core/languagefilter_p.h PRE-CREATION 
>   src/core/loader.cpp ee8db0e 
>   src/core/settings.cpp 095eddb 
>   src/core/settings_p.h ee2d22c 
>   src/core/speller.h 7428339 
>   src/core/speller.cpp 8cc2a1e 
>   src/core/textbreaks.cpp PRE-CREATION 
>   src/core/textbreaks_p.h PRE-CREATION 
>   src/core/tokenizer.cpp PRE-CREATION 
>   src/core/tokenizer_p.h PRE-CREATION 
>   src/plugins/CMakeLists.txt fc33a97 
>   src/plugins/aspell/kspell_aspellclient.h eadb52a 
>   src/plugins/enchant/CMakeLists.txt 817db0c 
>   src/plugins/enchant/enchantclient.h 25f62eb 
>   src/plugins/hspell/CMakeLists.txt e128cb3 
>   src/plugins/hspell/kspell_hspellclient.h 966303f 
>   src/plugins/hunspell/CMakeLists.txt ccae7f7 
>   src/plugins/hunspell/kspell_hunspellclient.h 79638bb 
>   src/ui/configui.ui 6532552 
>   src/ui/configwidget.cpp 7a5cc99 
>   src/ui/dialog.cpp 13ad39d 
>   src/ui/highlighter.h 46418b9 
>   src/ui/highlighter.cpp 9f31268 
>   src/unicode/CMakeLists.txt 1be0a54 
>   src/unicode/README f9b8030 
>   src/unicode/data/GraphemeBreakProperty.txt 8805f36 
>   src/unicode/data/SentenceBreakProperty.txt fc58820 
>   src/unicode/data/WordBreakProperty.txt 78c531c 
>   src/unicode/parseucd/parseucd.cpp a050140 
>   tests/test_dialog.cpp 0579bb2 
>   tests/test_highlighter.h 9cf5657 
>   tests/test_highlighter.cpp 695a2df 
>   tests/test_textedit.cpp 5c02809 
>   data/trigrams/fr PRE-CREATION 
>   data/trigrams/ha PRE-CREATION 
>   data/trigrams/hi PRE-CREATION 
>   data/trigrams/hr PRE-CREATION 
>   data/trigrams/hu PRE-CREATION 
>   data/trigrams/id PRE-CREATION 
>   data/trigrams/is PRE-CREATION 
>   data/trigrams/it PRE-CREATION 
>   data/parsetrigrams.cpp PRE-CREATION 
>   data/trigrams/af PRE-CREATION 
>   data/trigrams/ar PRE-CREATION 
>   data/trigrams/az PRE-CREATION 
>   data/trigrams/bg PRE-CREATION 
>   data/trigrams/ca PRE-CREATION 
>   data/trigrams/cs PRE-CREATION 
>   data/trigrams/cy PRE-CREATION 
>   data/trigrams/da PRE-CREATION 
>   data/trigrams/de PRE-CREATION 
>   data/trigrams/en PRE-CREATION 
>   data/trigrams/es PRE-CREATION 
>   data/trigrams/et PRE-CREATION 
>   data/trigrams/eu PRE-CREATION 
>   data/trigrams/fa PRE-CREATION 
>   data/trigrams/fi PRE-CREATION 
>   CMakeLists.txt 1fdcf1e 
>   README.md 63e2c6a 
>   autotests/CMakeLists.txt e9fc573 
>   data/CMakeLists.txt PRE-CREATION 
> 
> Diff: https://git.reviewboard.kde.org/r/114717/diff/
> 
> 
> Testing
> -------
> 
> mostly using test_highlighter.
> 
> 
> Thanks,
> 
> Martin Tobias Holmedahl Sandsmark
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20131229/dce12786/attachment.htm>


More information about the kde-core-devel mailing list