Translation in Qt5

Sun Jul 3 15:24:53 BST 2011

On Sun, Jul 03, 2011 at 01:06:40AM +0200, Chusslove Illich wrote:
> >> [: Chusslove Illich :]
> >> * PO has to be natively supported. [...] The real advantage instead is in
> >> the format and the process; and also tool support on translators' side.
> >
> > [: Oswald Buddenhagen :]
> > how is the process fundamentally different from the linguist toolchain?
> > why would the format matter for anything other than the tool support?
> 
> First of all, this has to be looked from the other side -- why would one
> replace PO (format, tools, process) with Linguist? If there would be indeed
> no fundamental difference, then there is also no need to replace. To
> consider replacing PO, Linguist (format, tools, process) would have to offer
> some advantages, and I see none.
> 
> Having said that, here are the overall advantages of PO:
> 
> * PO extraction is unified for many programming languages (over 20) and even
> environments within certain languages (KDE, Qt). I do not have to switch my
> POT creation tool when working across these languages/environments.
> 
well, below you noted that this isn't too important.

> * PO format is a low-fat highly-informative text format, perfectly manually
> editable.
>
symbian had .loc files before qt came. these files looked suspiciously
similar to .po files. when we came and a "nokia-wide translation format
harmonization effort" was started, they made a hard requirement for the
format to be xml-based - because translators routinely botched their
.loc files.
of course this indicates inadequate tooling, but one could argue that
breaking xml is harder when using generic tools (the most specialized,
yet generic, that is - for xml that's generic xml editors, xml parser &
writer libraries, etc. - not a plain text editor).

> the other is when you are making something new and do not dare come up with
> a custom format, lest you make a mess with ad-hoc extensions in the future.
>
well, we did some additions to .ts during the symbianization effort. we
added some meta data (which in .po requires some non-standard magic syntax -
"#. qt-id foo") and the dreaded length variants (which we represent with
a magic separator in .po files; this (mostly undocumented) feature will
die again in qt5).

> * PO merging is a critical element of the PO process, and I don't know any
> chain other than Linguist which has merging defined at all. And for Linguist
> I am not entirely sure that it has merging that well-defined; well, unless
> you examined in detail what msgmerge does :)
> 
i didn't look into the code (too much "inspiration" would be legally
questionable), but it seems likely that lupdate's merging is inferior.
that's not something which could not be fixed, though.

> > [...] qt tries to have no external tool dependencies, because they are a
> > hassle on anything except linux. i'm also not sure how the requirement for
> > using gpl'd tools would resonate with some of the proprietary qt
> > customers.
> 
> I can't see how GPL would matter for them in this. If it's about a "fuzzy
> bad feeling", I simply don't care.
> 
it's a tad more than that. some organizations (military and oil industry
are the canonical examples) are allergic to anything with *gpl in it.
while they can go and screw themselves as far as i'm concerned, this
position may be somewhat hard to sell higher up the food chain. it's one
of the main reasons why that non-free contribution agreement is in
place.

> But I do see how external tool dependencies matter, given that Qt was always
> meant to be quite a self-contained solution. Therefore I have nothing
> against keeping around l* tools for that purpose. But msg* tools have to be
> an almost drop-in replacement for them (it's ok if precise command sequence
> and options differ), for the projects that have no problem in using them.
> 
that in turn makes totally no sense to me. either the l* chain is good
enough and there is no point in using gettext, or it is not, an then it
should be improved or replaced. there is little alternative to that in
the context of the kde frameworks properly joining the qt ecosystem.

> I actually thought you intended QStringFormatter to be unrelated to
> translations. I'm not sure it is smart to relate it directly to
> translations.
>
i already answered that:
> > i don't see much point in a QStringFormatter unrelated to
> > translations - it just seemed like a generalization bonus.

> E.g. what would indicate which string should be extracted and which
> not? (KLocalizedString cannot be directly constructed, it has to go
> through one of i18n*()/ki18n*() calls.)
>
that wouldn't be different with qTr().

> I think the proper chain would be QStringTranslator ->
> QStringFormatter -> QString.
> 
the questions would be a) what would be the benefit (especially given
that we'll need to keep arg() in qt5 anyway) and b) what would be the
performance impact of such chaining?

> >>> for advanced formatting, i'm envisioning the syntax %[12.34h]1, i.e.,
> >>> sprintf-like options in brackets.
> >>
> > the use case is adjusting the format to available size, which is a very
> > real problem on small devices.
> 
> Hm... as in, translator may need to steal a bit from the argument length in
> order to make rest of the text fit? Something smells wrong there...
> 
even more wrong when you cut characters while you have a pixel width and
a proportional font. you wouldn't believe how hard it is to convince
certain nokians of this rather obvious reality ... :}
anyway, more valid use cases are:
- choosing a less/more verbose date format
- using less padding, thus trading truncation for risk of misalignment
- choosing a digit style depending on context (think arabian 6)

> In the "perfect text translation library" I would like that argument
> placeholders are named and fully contained in mirror-character wrappers.
> E.g. with braces and in Python, it could look like this:
> 
>   i18n("Notification from {appname}", appname=...)
>   i18n("Allow access to {service} by {username}?", service=..., username=...)
> 
i see some problems with that:
1) while safer, this is exceedingly verbose for the programmer - he may
   need to write the same identifier three times
2) the obvious implementation problems in c++
3) i wonder what the perfomance impact of named parameters would be

2) can be approached somewhat easily at the cost of ugliness:
    ki18n("Notification from {appname}").ARG(appname)
    ki18n("Notification from {appname}").ARG2(appname, that->appName())
  which is a shorthand for:
    ki18n("Notification from {appname}").subs("appname", appname)
    ki18n("Notification from {appname}").subs("appname", that->appName())

1) can be approached this way:
    i18n("Notification from " ARG(appname))
  i.e., word puzzles which are automatically transformed into format
  strings. for the string itself it is trivial on the code side, and
  requires some magic in the string extractor. but i have no clue how to
  make an argument list out of that without using an additional
  pre-processor.

3) makes me wonder whether the names should be actually in the format
string, and not just formalized annotations which are blended in by the
tools? or, to keep the tools simpler, have the message compiler replace
the names by numbers? in both cases the binary depends on positional
parameters, which means the source code needs to use (annotated)
positional parameters, or it needs to be pre-processed.

> But I no longer remember why I thought implicit conversion is
> dangerous; and why people didn't throw at me "don't be stupid, add
> implicit conversion".
> 
well, to start with, it requires the addition of a QString constructor,
which you were hardly in a position to do ...

> > some random differences between gettext() and tr():
> >
> > - tr() uses %n instead of the first generic integer parameter for
> > identifying the plural form. i like it that way, because it's more
> > explicit.
> 
> I don't like it that way because it introduces a special placeholder for no
> particularly good reason. If you have only one integer, then there is no
> difference. If you have more than one integer, then you normally have a much
> bigger problem at hand than reduced explicitness.
>
but that's exactly my point. when you notice that you need multiple %n,
you need to start thinking. when everything is just a number, it's
easier to forget about the problem, and on the "receiving" side harder
to grasp the intention.

> (Such message usually has to be split, into as many partial plural
> messages as there are numbers, plus one joiner message; all must have
> contexts explaining the split.)
> 
alternatively, in some cases one can prioritize one count and just
require "grammatically detaching" the others:
  "Generated %n translation(s) (%1 finished and %2 unfinished)"
may need a logical transformation to
  "Generated %n translation(s) (finished: %1, unfinished: %2)"
to be translatable to slavic languages. this makes it only marginally
worse than the original, so such a shortcut may be perfectly acceptable.
using %n to denote the "anchor" number makes a lot of sense then.

> > - tr() has no plural support for the source language. this means that the
> > messages are by definition degraded to elaborate ids
> > [...]
> > in practice, the need for an additional translation is somewhat annoying
> > (and consequently neither qt nor creator have one). i'm yet to be
> > convinced which approach is better.
> 
> I'm with you on the doubt :)
> 
excellent ;)

> I have the following additional wory about "elaborate IDs": I wouldn't want
> that programmers (those who know exactly what they meant) go and make a
> meaning-changing fixes in English translation.
> 
well, that's simply forbidden by the process. anyone who bypasses this
should be instantly shot (which means that qt should have no/a new german
translator for a long time now :D).
if a message is broken beyond repair meaning-wise, then it needs to be
changed. string freeze doesn't apply then.

> > lupdate somewhat recently gained support for purely informative comments
> > (//:, equivalent to //gettext:). these are also used for transmitting meta
> > data (message ids, etc.). i kind of dislike this format, because it is
> > detached from the actual c++ grammar. so i'm considering a dummy argument
> > to qTr() instead:
> 
> On the contrary, I think the "detached" way is just fine. That is because
> text and context form up the message key, and changing either leads to a new
> or a fuzzy message for translators -- e.g. breaking a message freeze. Making
> the purely informative comment an argument would make it appear far closer
> in technical significance to text and context than it acutally is.
> 
the meta data (including free-form comments) actually *is* of technical
significance - not to the tr() function, but to the translation process
(including the tooling; changing a message id may very well throw it
off).
the language detachment poses a problem for more complex statements with
multiple translatable strings (think QMessageBox). one can either embed
the comments in "creative ways" or create temporaries before the actual
statement.
also, the magic comments are still comments and thus tend to get out of
sync with reality, or get detached from the code they belong to. i'd
expect this to be better with properly embedded meta-data (though that's
mostly just a hope).