Finalized proposal for changes to i18n in KF5

Mon Jan 7 23:01:07 GMT 2013

On Sat, Jan 05, 2013 at 06:38:58PM +0100, Chusslove Illich wrote:
> > [: Oswald Buddenhagen :]
> > of course, it would be even better if you strived for submission to qt-
> > project, if at all realistic (for now probably an add-on, but definitely
> > under cla). otherwise you'll see the same effect every other useful lgpl'd
> > qt framework sees sonner or later: it gets re-implemented (if the effort
> > is deemed acceptable by an interested party).
> 
> I'm not opposed to some additional bureaucracy in order to make the
> framework more accessible to potential users. But I'd have to see what it
> actually means, and what could be the tradeoff.
> 
well, the bureaucracy is finding all code which is not fully copyrighted
by you, and determining whether the authors can be reached and agree to
signing the CLA. if not, rewrite the code, or drop the idea.

> As for another party reimplementating the framework, I don't see what factor
> is that.
>
well, it's a personal factor. i'd hate it to see my thoroughly
engineered code being displaced by a (possibly even slightly inferior)
competitor just because i can't compete on licensing compatibility
terms.
this is of course a challenge that all of kde frameworks will face, and
all but the most specialized+big ones will lose in the longer term - as
they always have.

> (Hypothetically speaking, though, I don't see it happening: if someone
> wants Gettext-based translation in Qt code but not through Ki18n, I
> expect he will, well, use Gettext directly.)
> 
nobody wants gettext as such (only setting up a compatible workflow is
important, and that mostly "only" in the FOSS world). i couldn't care
less, and i even see it as an disincentive (because i have no direct
control over the data formats and tools, and even if i can change
something, there is the problem of slow independent distribution of
these newer versions).
the real worth of your work (and majority of effort, i suppose) lies in
all that semantic and scripting stuff, including the documentation
(which implies standardization on sane guidelines).

> > [...] make klocalizedstring.h #undef TRANSLATION_CATALOG at the end (may
> > need to use a macro indirection to ensure that the macro is fully expanded
> > already at definition time (see QT_STRINGIFY in qt 5)).
> 
> I looked, but couldn't figure out how to use it in this context.
> 
#define RESOLVE_TRANSLATION_CATALOG(c) (c)
#define i18n(...) i18nd(RESOLVE_TRANSLATION_CATALOG(TRANSLATION_CATALOG), __VA_ARGS__)
#undef RESOLVE_TRANSLATION_CATALOG
#undef TRANSLATION_CATALOG

i'm not sure this actually works ...

> I'd definitelly like to have the version markers visible for all elements.
> For example, so that there is no uncertainty whether a marker was forgotten
> or not.
> 
> I struggled with how to present it, and in particular thought that a
> separate colon is an overkill. Maybe have the superscript yet smaller and a
> bit dimmer?
> 
dunno. not sure what vision-impaired people will say to that.
but a separate column is definitely easier to (visually) skip over than
some appendage to the actual items. especially if you put that column
last.

> > and as usual for native-only-in-slavic speakers, some "the"s are missing.
> > i was too lame to record their locations. ^^
> 
> I've given up and put a pox on them.
> 
> (I did toy with another idea though: compute the statistical average of
> the's-per-word for a given class of texts, and then pepper my text
> proportionally.)
> 
lol

> > i don't like the recommendation for extracted vs. disambiguating comments
> > (and closed-source authors will typically do the exact opposite anyway).
> > wouldn't it be sufficient for disambiguiation to strongly recommend
> > consistent use of user interface markers, and thus allow all comments to
> > be extracted?

> > the matter of flagging changes is merely tooling-related.
> 
> Yes, but tooling decisions are related to PO convention and workflow.
> There'd be awful lot of tooling to modify, and modify by adding
> options and not changeing the default behavior.
>
the change is trivial: just add a flag to indicate that the comment
changed. the backwards compatible variant would be just (ab)using fuzzy
for that (e.g., set "fuzzy,fuzzcmt", so updated tools see these "soft
fuzzies", while older tools just treat them as normal fuzzies).

> There would also be no practical purpose to having both types of
> contexts, unless there was a significant difference between them.
> 
well, proprietary users don't like exposing their comments, so they want
minimal disambiguation (what the end user will see in the ui anyway).
also, getting people into the habit of defaulting to comments (paired
with consistent use of @markers) makes it less likely that they put an
epic into a disambiguation when they really meant to put it into a
comment (which is easier to edit and produces leaner catalogs).
of course, sometimes the line between comment and disambiguation is a
tad thin (think annotating "%1: %2"), so either variant doesn't work
entirely without thinking...

> > one thing i noticed while looking through catalogs is that it often would
> > be useful to be able to declare some kind of hierarchical comments, so
> > that a particular comment could apply to a whole group of strings, without
> > needing to replicate it, or relying on the translators' ability to see the
> > pattern themselves (which is a pipe dream, especially if only some strings
> > in an existing group changed). i suspect that this may turn out "a bit"
> > hard to implement without hacking gettext (and the .po format) ...
> 
> The nicety of not having to manually replicate comments and contexts in
> hierarchical situations,
>
only comments.
contexts need support from both the code side and the tooling side, and
it's a nightmare to get the tooling right for c++ (because it is not
parsable without semantic interpretation, and the preprocessor tops it
off).

> would have to be balanced by introducing yet more i18n-related syntax
> to source files.
> 
from a dev's perspective, i'd prefer a bit of additional syntax over
mindless duplication (or missing information) any day.

> This also means that drive-by i18n fixers would have to pay more
> attention,
>
they usually come for a reason, so they know the context. and those who
do bulk changes can be expected to be familiar with the mechanisms in
general.

> and that code i18n checking tools would have to be smarter.
>
tooling does indeed become more complex, but it's not that bad (if the
syntax is mostly outside c++ itself, i.e., hidden in comments).

> I don't think anything in PO files should change in this case, simply make a
> proper split of information between of #. and msgctxt. It is the extraction
> tool, xgettext, that would need changes.
> 
that suggests that you would duplicate the extracted comment into
all msgs it's supposed to apply to. of course that's easiest (it doesn't
need special presentation support) and wholly sufficient for typical
uses (e.g., comment 120 strings with "keyboard key name"), but would
look "funny" in other cases (e.g., and epic which describes the
correlation of three messages).