D25495: Fix Sonnet autodetect failing on Indian langs
Waqar Ahmed
noreply at phabricator.kde.org
Wed Jan 1 15:58:55 GMT 2020
waqar added a comment.
Hi,
First of all thanks for reviewing.
> I'd suggest to move your changes to GuessLanguage::identify(const QString &text, const QStringList &suggestionsListIn) after the call to d->identify(text, d->findRuns(text));
Okay. I will do that, but I will have to move the `d->findRuns(text)` out of the function call.
> but only add those languages for which there is a dictionary
I think that will not be an issue because `s_scriptLanguages` only has the languages for which there are dictionaries. So just to make my point clear, for example if you don't have 'English' dictionary installed, sonnet will never be able to guess the language of the text.
The resulting changes look like this:
//get the scripts for current text
auto scriptsList = d->findRuns(text);
//try guessing from trigrams
QStringList candidateLanguages = d->identify(text, scriptsList);
if (candidateLanguages.isEmpty() && !scriptsList.isEmpty()) {
for (const QChar::Script script : scriptsList) {
const auto languagesList = d->s_scriptLanguages.values(script);
for (const auto &lang : languagesList) {
//if trigrams don't have this language then add it to the candidates
if (!d->s_knownModels.contains(lang))
candidateLanguages.append(lang);
}
}
}
> There is also a bug in GuessLanguagePrivate::guessFromTrigrams(const QString &sample, const QStringList &languages): if m_minConfidence is left to its default value of '0', that function will always return an empty list. I will propose a fix shortly.
Alright, I am excited to hear.
> The real issue behind Bug 176537 is a different one, however. On-the-fly spell checking in Kate(Part) will only check one line at a time, potentially not providing enough text for a meaningful language detection.
To be honest, I haven't ever had an issue with that. I mostly test on QOwnNotes, and spellchecking works the same way there i.e., one line at a time. If there is a dictionary present, sonnet will guess the language correctly most of the times. But you are right in that,..more text would enable sonnet to be more accurate. However, autodetection works on a sentence basis, and sentences can sometimes be quite short.
> I plan to perform the language detection inside KatePart, so that there is also feedback regading the detected language that is shown to the user, who can then also override the detected language, if desired.
That would be really cool!
I guess the rest of the dictionaries (of the same script) can be shown in the context menu to allow the user to override the detected language.
REPOSITORY
R246 Sonnet
REVISION DETAIL
https://phabricator.kde.org/D25495
To: waqar, mludwig, cullmann
Cc: ognarb, kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20200101/aebfa323/attachment-0001.html>
More information about the Kde-frameworks-devel
mailing list