Sonnet: Trigram data file for Greek
John Salatas
jsalatas at ictpro.gr
Wed Jan 25 05:00:43 UTC 2017
On 2017-01-24 04:03 PM, Christoph Feck wrote:
> On 24.01.2017 19:45, John Salatas wrote:
>> On 2017-01-24 10:40 AM, Christoph Feck wrote:
>>> On 24.01.2017 09:01, John Salatas wrote:
>>>> I just built a trigram data file (attached) for Greek language using
>>>> a
>>>> wikipedia dump as corpus. Can I just push it to sonnet's git or do I
>>>> need to generate a differential review in phabricator or something?
>>>
>>> Isn't Greek language automatically detected by Greek script? As far
>>> as
>>> I know, no other language uses the Greek script.
>>
>> It doesn't seem to be the case if it is mixed with other languages in
>> the same sentence, which is rather typical in technical documents that
>> contain English terms. For example the following excerpt from
>>
>> https://el.wikipedia.org/wiki/Linux
>>
>> Το Linux (Λίνουξ) ή GNU/Linux (Γκνού/Λίνουξ), είναι ένα λειτουργικό
>> σύστημα που αποτελείται από ελεύθερο λογισμικό. Η χρήση του είναι
>> παρόμοια με αυτή του Unix, αλλά όλος ο πηγαίος κώδικας του έχει
>> γραφτεί
>> από την αρχή ως ελεύθερο λογισμικό υπό την ελεύθερη άδεια χρήσης GNU
>> General Public License.
>
> Okey, this eventually has to be fixed, but if adding a Greek trigram
> file improves detection, please add one.
added.
Thanks!
More information about the Kde-frameworks-devel
mailing list