Review Request 128243: Fix parsing of env vars in KLocalizedStringPrivateStatics::initializeLocaleLanguages()

Friedrich W. H. Kossebau kossebau at kde.org
Sun Jun 26 19:27:43 UTC 2016



> On June 18, 2016, 11:11 p.m., Chusslove Illich wrote:
> > a) Right, old code is obviously wrong (sigh).
> > 
> > b) According to Gettext convention, LANGUAGE is a fine-tuning variable and it (each of its colon-separated parts) is to be taken as-is, without the system trying to be smart.
> 
> Friedrich W. H. Kossebau wrote:
>     a) possibly stayed undetected as there might be no locales in real world which have both modifier and charset set. Still is better to have the code straight, less confusing for anyone who might come to read it (like me :) ).
>     
>     b) My (recently gained) theoretic knowledge about the LANGUAGE env var is mainly from ABOUT-NLS, and there it says "'LL_CC' combinations can be abbreviated as 'LL' to
>     denote the language's main dialect.". From which I had guessed that e.g. both "LANGUAGE=ru_RU:foo:bar" and "LANGUAGE=ru:foo:bar" would result in the LOCALEPREFIX/ru/LC_MESSAGES/" catalogs being picked up (at least when there are no ru_RU of course). Being a newbie on this field, I miss real-world experience surely and just can talk from what I understood so far and see on my system.
>     
>     Now, I have not yet got to understanding the code in the gettext lib, but from testing on commandline "LANGUAGE=ru_RU:de:en" will result in the catalogs from the ru folder being used e.g. for the coreutils tool "ls":
>     ```
>     LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en ls --help
>     ```
>     (yes, even using charset here, to show that even that is dealt with when set, though perhaps just ignored)
>     
>     Doing the same with a ki18n app does not work:
>     ```
>     LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru_RU.UTF-8:de:en okteta
>     
>     ```
>     will with the current code not result in russian translations in "ru" folder being used for ki18n-based translations, but the "de_DE" on.
>     More, other than ki18n, the Qt system locale seems to pick up the LANGUAGE setting and goes fur russian, at least QLocale::system(), as used by kf5 code for picking the language to use with QTranslator, results in russian catalogs ("*_ru.qm") being loaded and used.
>     Can be seen e.g. in the menu string "Show Toolbar" (in "Settings" menu, from kconfig5_qt.po) or the print dialog, where most string are from Qt lib catalog.
>     Which results in a language mix in the UI.
>     
>     Only when using just the language code "ru", matching exactly the folder name with the catalog, this will give me also russian translations for anything based on ki18n:
>     ```
>     LANG=de_DE.UTF-8 LC_MESSAGES= LC_ALL= LANGUAGE=ru:de:en okteta
>     ```
>     
>     So at least on my openSUSE Tumbleweed system both the gettext lib and the Qt locale code seem to have a different treatment of the LANGUAGE env var than what the current ki18n code does.
>     No idea/experience if that is a problem in the real world though with real world usage of values for the LANGUAGE var. Just hit this issues during naive/clueless translation handling testing and playing around with all the vars. Still, with my current knowledge I would feel better to align the behaviour of ki18n more with the others.
> 
> Friedrich W. H. Kossebau wrote:
>     Possibly things need even more fixing. So values in the list set for "LANGUAGE" seem to be not only taken as they are but instead are also tried in stripped variants (so "ru" catalogs are taken for "LANGUAGE=ru_RU") from what I see. But that comment from gettext's ABOUT-NLS I cited earlier ("'LL_CC' combinations can be abbreviated as 'LL' to denote the language's main dialect.") might also mean ki18n needs to check for a catalog with the language's main dialect, in case there is none for for just the language itself?
>     
>     Question which currently prevents me from understanding more: how to know the language's main dialect? Is that "ll_LL", so the same letters as used for the country as used for the language? (I learned that LANG has to have the country always set in the given code, so it is not a problem there)
>     
>     So with "LANGUAGE=ll" if there is no "ll/LC_MESSAGES/foo.mo" would we also need to check for "ll_LL/LC_MESSAGES/foo.mo"?

The code at least of the gnu implementation seems to not make a different handling of the values for LANGUAGE over the values for LANG, LC_MESSAGES or LC_ALL. Is that a GNU exception of the rule "to be taken as-is"? But than the whole var LANGUAGE is a GNU extension from what I read.

I followed KCatalog::translate -> dgettext -> dcgettext -> dcigettext. There `guess_category_value()` is called to get the value from the env vars  (see https://code.woboq.org/gcc/intl/dcigettext.c.html#565). Which adheres to the known preferences, choosing LANGUAGE over LANG, LC_MESSAGES or LC_ALL (latter three cared for by `__current_locale_name()` or `_nl_locale_name(...)`, where at least the latter also just takes the values as given, without further processing). The returned value then is processed further without making a difference whether it came from LANGUAGES or the other vars and passes all of them (split by ":" char) to `_nl_find_domain()` in the same way.

So both by behaviour and by how I understand the linked implementation, the patch as proposed here should be an improvement :) Please tell me what I missed otherwise which should stop this patch.


- Friedrich W. H.


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://git.reviewboard.kde.org/r/128243/#review96702
-----------------------------------------------------------


On June 18, 2016, 8:36 p.m., Friedrich W. H. Kossebau wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://git.reviewboard.kde.org/r/128243/
> -----------------------------------------------------------
> 
> (Updated June 18, 2016, 8:36 p.m.)
> 
> 
> Review request for KDE Frameworks and Chusslove Illich.
> 
> 
> Repository: ki18n
> 
> 
> Description
> -------
> 
> This patch fixes two things:
> 
> a) `splitLocale(...)` had the wrong order of splitting off modifier and codeset: the old code was first splitting off anything behind "." as charset (which would include the modifier though if present) and only then seeing to split off anything behind "@" as modifier. So with both charset and modifier present this would fail.
> 
> b) The locales listed in "LANGUAGE" would be, other than those in "LC_ALL", "LC_MESSAGES" and "LANG", only added as they are to the list of `localeLanguages`, without generating their variants. That seems unbalanced to me, as it would mean KCatalog not properly detecting mo files e.g. in "/usr/share/locale/ru/LC_MESSAGES/" subfolder with "LANGUAGE=ru_RU.UTF-8", "LANG=", "LC_ALL=", "LC_MESSAGES=". Or is that on purpose?
> 
> 
> Diffs
> -----
> 
>   src/klocalizedstring.cpp fc80135 
> 
> Diff: https://git.reviewboard.kde.org/r/128243/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Friedrich W. H. Kossebau
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20160626/18c34f2c/attachment.html>


More information about the Kde-frameworks-devel mailing list