Sonnet: Trigram data file for Greek

Christoph Feck cfeck at kde.org
Wed Jan 25 00:03:00 UTC 2017


On 24.01.2017 19:45, John Salatas wrote:
> On 2017-01-24 10:40 AM, Christoph Feck wrote:
>> On 24.01.2017 09:01, John Salatas wrote:
>>> I just built a trigram data file (attached) for Greek language using a
>>> wikipedia dump as corpus. Can I just push it to sonnet's git or do I
>>> need to generate a differential review in phabricator or something?
>>
>> Isn't Greek language automatically detected by Greek script? As far as
>> I know, no other language uses the Greek script.
>
> It doesn't seem to be the case if it is mixed with other languages in
> the same sentence, which is rather typical in technical documents that
> contain English terms. For example the following excerpt from
>
> https://el.wikipedia.org/wiki/Linux
>
> Το Linux (Λίνουξ) ή GNU/Linux (Γκνού/Λίνουξ), είναι ένα λειτουργικό
> σύστημα που αποτελείται από ελεύθερο λογισμικό. Η χρήση του είναι
> παρόμοια με αυτή του Unix, αλλά όλος ο πηγαίος κώδικας του έχει γραφτεί
> από την αρχή ως ελεύθερο λογισμικό υπό την ελεύθερη άδεια χρήσης GNU
> General Public License.

Okey, this eventually has to be fixed, but if adding a Greek trigram 
file improves detection, please add one.



More information about the Kde-frameworks-devel mailing list