D25495: Fix Sonnet autodetect failing on Indian langs

Michel Ludwig noreply at phabricator.kde.org
Fri Jan 3 12:39:20 GMT 2020


mludwig added a comment.


  Thanks for the updated patch!
  
  In D25495#585855 <https://phabricator.kde.org/D25495#585855>, @waqar wrote:
  
  > > There is also a bug in GuessLanguagePrivate::guessFromTrigrams(const QString &sample, const QStringList &languages): if m_minConfidence is left to its default value of '0', that function will always return an empty list. I will propose a fix shortly.
  >
  > Alright, I am excited to hear.
  
  
  Here is my proposal:
  
  https://phabricator.kde.org/D26346
  
  > 
  > 
  >> The real issue behind Bug 176537 is a different one, however. On-the-fly spell checking in Kate(Part) will only check one line at a time, potentially not providing enough text for a meaningful language detection.
  > 
  > To be honest, I haven't ever had an issue with that. I mostly test on QOwnNotes, and spellchecking works the same way there i.e., one line at a time. If there is a dictionary present, sonnet will guess the language correctly most of the times. But you are right in that,..more text would enable sonnet to be more accurate. However, autodetection works on a sentence basis, and sentences can sometimes be quite short.
  
  With your patch, language detection works much better in KatePart. However, due to way Sonnet is used in KatePart for on-the-fly spell checking, one issue still is that every word is basically checked against every dictionary. So, in the sentence "English is an interesting langage", "langage" may not be considered to be misspelled since "langage" is a correctly spelled French word, for example. I will still work on that.

INLINE COMMENTS

> guesslanguage.cpp:589
> +    //if guessing from trigrams fail
> +    if (candidateLanguages.isEmpty() && !scriptsList.isEmpty()) {
> +        for (const QChar::Script script : scriptsList) {

Couldn't this if-statement can be dropped? I guess one can argue that sometimes there may be a language without trigrams that would even be a better language guess?

REPOSITORY
  R246 Sonnet

REVISION DETAIL
  https://phabricator.kde.org/D25495

To: waqar, mludwig, cullmann
Cc: ognarb, kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20200103/ed0f2170/attachment.html>


More information about the Kde-frameworks-devel mailing list