Sonnet: Trigram data file for Greek

John Salatas jsalatas at ictpro.gr
Wed Jan 25 05:00:43 UTC 2017


On 2017-01-24 04:03 PM, Christoph Feck wrote:
> On 24.01.2017 19:45, John Salatas wrote:
>> On 2017-01-24 10:40 AM, Christoph Feck wrote:
>>> On 24.01.2017 09:01, John Salatas wrote:
>>>> I just built a trigram data file (attached) for Greek language using 
>>>> a
>>>> wikipedia dump as corpus. Can I just push it to sonnet's git or do I
>>>> need to generate a differential review in phabricator or something?
>>> 
>>> Isn't Greek language automatically detected by Greek script? As far 
>>> as
>>> I know, no other language uses the Greek script.
>> 
>> It doesn't seem to be the case if it is mixed with other languages in
>> the same sentence, which is rather typical in technical documents that
>> contain English terms. For example the following excerpt from
>> 
>> https://el.wikipedia.org/wiki/Linux
>> 
>> Το Linux (Λίνουξ) ή GNU/Linux (Γκνού/Λίνουξ), είναι ένα λειτουργικό
>> σύστημα που αποτελείται από ελεύθερο λογισμικό. Η χρήση του είναι
>> παρόμοια με αυτή του Unix, αλλά όλος ο πηγαίος κώδικας του έχει 
>> γραφτεί
>> από την αρχή ως ελεύθερο λογισμικό υπό την ελεύθερη άδεια χρήσης GNU
>> General Public License.
> 
> Okey, this eventually has to be fixed, but if adding a Greek trigram
> file improves detection, please add one.

added.


Thanks!



More information about the Kde-frameworks-devel mailing list