<table><tr><td style="">waqar added a comment.
</td><a style="text-decoration: none; padding: 4px 8px; margin: 0 8px 8px; float: right; color: #464C5C; font-weight: bold; border-radius: 3px; background-color: #F7F7F9; background-image: linear-gradient(to bottom,#fff,#f1f0f1); display: inline-block; border: 1px solid rgba(71,87,120,.2);" href="https://phabricator.kde.org/D25495">View Revision</a></tr></table><br /><div><div><p>Hi,<br />
First of all thanks for reviewing.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>I'd suggest to move your changes to GuessLanguage::identify(const QString &text, const QStringList &suggestionsListIn) after the call to d->identify(text, d->findRuns(text));</p></blockquote>
<p>Okay. I will do that, but I will have to move the <tt style="background: #ebebeb; font-size: 13px;">d->findRuns(text)</tt> out of the function call.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>but only add those languages for which there is a dictionary</p></blockquote>
<p>I think that will not be an issue because <tt style="background: #ebebeb; font-size: 13px;">s_scriptLanguages</tt> only has the languages for which there are dictionaries. So just to make my point clear, for example if you don't have 'English' dictionary installed, sonnet will never be able to guess the language of the text.</p>
<p>The resulting changes look like this:</p>
<div class="remarkup-code-block" style="margin: 12px 0;" data-code-lang="text" data-sigil="remarkup-code-block"><pre class="remarkup-code" style="font: 11px/15px "Menlo", "Consolas", "Monaco", monospace; padding: 12px; margin: 0; background: rgba(71, 87, 120, 0.08);"> //get the scripts for current text
auto scriptsList = d->findRuns(text);
//try guessing from trigrams
QStringList candidateLanguages = d->identify(text, scriptsList);
if (candidateLanguages.isEmpty() && !scriptsList.isEmpty()) {
for (const QChar::Script script : scriptsList) {
const auto languagesList = d->s_scriptLanguages.values(script);
for (const auto &lang : languagesList) {
//if trigrams don't have this language then add it to the candidates
if (!d->s_knownModels.contains(lang))
candidateLanguages.append(lang);
}
}
}</pre></div>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>There is also a bug in GuessLanguagePrivate::guessFromTrigrams(const QString &sample, const QStringList &languages): if m_minConfidence is left to its default value of '0', that function will always return an empty list. I will propose a fix shortly.</p></blockquote>
<p>Alright, I am excited to hear.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>The real issue behind Bug 176537 is a different one, however. On-the-fly spell checking in Kate(Part) will only check one line at a time, potentially not providing enough text for a meaningful language detection.</p></blockquote>
<p>To be honest, I haven't ever had an issue with that. I mostly test on QOwnNotes, and spellchecking works the same way there i.e., one line at a time. If there is a dictionary present, sonnet will guess the language correctly most of the times. But you are right in that,..more text would enable sonnet to be more accurate. However, autodetection works on a sentence basis, and sentences can sometimes be quite short.</p>
<blockquote style="border-left: 3px solid #a7b5bf; color: #464c5c; font-style: italic; margin: 4px 0 12px 0; padding: 4px 12px; background-color: #f8f9fc;"><p>I plan to perform the language detection inside KatePart, so that there is also feedback regading the detected language that is shown to the user, who can then also override the detected language, if desired.</p></blockquote>
<p>That would be really cool!<br />
I guess the rest of the dictionaries (of the same script) can be shown in the context menu to allow the user to override the detected language.</p></div></div><br /><div><strong>REPOSITORY</strong><div><div>R246 Sonnet</div></div></div><br /><div><strong>REVISION DETAIL</strong><div><a href="https://phabricator.kde.org/D25495">https://phabricator.kde.org/D25495</a></div></div><br /><div><strong>To: </strong>waqar, mludwig, cullmann<br /><strong>Cc: </strong>ognarb, kde-frameworks-devel, LeGast00n, GB_2, michaelh, ngraham, bruns<br /></div>