Translation in Qt5

Sat Jul 2 18:27:35 BST 2011

>> [: Oswald Buddenhagen :]
>> - for the inline markup, chusslove gave me a list with 10-15 items when i
>> asked "what would you do differently if you could do KUIT again?". half
>> of it is all greek to me [...]
>
> [: Chusslove Illich :]
> [...] convert that list of problems that I gave you into a non-all-greek
> text, since all those problems still stand as they are. [...]

Attached.

-- 
Chusslove Illich (Часлав Илић)
-------------- next part --------------
1. Which messages should be KUIT-aware?

When text contains markup, some special characters need to be escaped when
they are wanted literally in the text. In XML-like markup this is at least <,
which is escaped as <, and then usually > is escaped as > for balance.
Since tags may have attributes with double or single quoted values,
there are escapes for " as " and ' as '. Finally, & itself needs
to be escaped when literally needed, with &.

This means that a programmer using KUIT has to keep in mind to escape
the literal text he types in (e.g. "Open <file> for editing" as
"Open <file> for editing"), but then also the text which is substituted
as argument (e.g. in "Use '%1' as the search expression" the argument
inserted for %1 has to be XML-escaped). The latter is particularly problematic,
because the argument may be another translated message, which was expanded
from KUIT to e.g. Qt rich text, so it should not be escaped -- the programmer
has always to think whether to escape or not the argument.
A possibility would be to try some auto-escaping (test whether the argument
can be interpreted as proper expansion of another KUIT message),
but this could cause unpredictable behavior.

With Qt in particular, there is also the matter of & as accelerator marker.
Should it be required always to escape it as & (which would be XML strict),
or only when it can be interpreted as a start of another escape sequence
(or of any &foobar; substring)?

This leads to the possibility that some programmers will not want to use KUIT
at all, and others may want to use it only on a subset of messages.
Should this be allowed? If yes, should it be possible to activate/deactivate
KUIT on message-by-message bases, or only globally (in some suitable sense)?

If KUIT can be deactivated on all or a subset of messages, then translators
cannot add KUIT markup on their own in translations of some messages.
They have to follow the lead of the original. Since the translator may be
more markup-savvy than the programmer, it would be a pity to have this
restriction.

Current state in KDE is that all i18n'ed message are unconditionally KUIT,
there is no auto-escaping, raw &-characters are allowed when they are
not positioned as start of &foobar;.

2. How to recover from markup errors?

A message with markup may end up invalid. Either due to literal typos, or,
more importantly, through argument substitution without proper escaping.
The question is how should the expansion react to this.

If no recovery is attempted and the processing just stops on error,
then the application user would see raw tags and possibly badly mixed up text.
To avoid raw markup, all tag-alike substrings could be removed, but that
could lead to loss of information (e.g. "Open <file2> for editing"
becomes "Open  for editing"). So probably the best is to do some sort
of heuristic recovery, like HTML renderers do.

Current state in KDE is that when a markup error happens,
the engine looks for all closed <foobar>...</foobar> patterns (regex),
and replaces them with expansions associated to those tags.

3. How to select output format?

KUIT markup lives only until the message is returned from an i18n() call,
which means that the resulting generic string contains its expansion.
E.g. if the target format is Qt rich text, the <filename>...</filename>
sections will expand to <b>...</b> after the i18n() call returns, and if
the target format is plain text, they will expand to '...'.
The question then is how should i18n() select the output format.

One simple solution is that there is actually only one target format,
Qt rich text. This means that KUIT can be used only in messages where
Qt rich text could be used. But this kind of defeats one of the main points
of having a semantic markup.

The other way would be to explicitly state the output format.
This could be done by outer tags directly in the text,
e.g. i18n("<kuit fmt='rich'>...</kuit>"), but that is probably too ugly.
Another obvious way would be by call name, e.g. i18n_rich("..."),
but this too is ugly.

The third possibility is through keywords in the context string.

Current state in KDE is that @-context markers determine the output format.
If there is no context marker, the output format is plain text.
Some context markers imply rich text while other imply plain text (which is
documented and considered part of KUIT specification), but there is also
a switch for explicit selection.
If there is no marker but there is already some Qt rich text inside,
then the output is rich text.
Examples:
  i18n("...") -> plain (default)
  i18nc("@info", "...") -> rich (implied by @info)
  i18nc("@info/plain", "...") -> plain (explicit switch by /plain)
  i18n("Blah ... <i>blah</i> ...") -> rich (implied by presence of Qt r.t.)

4. Mixing with other markup?

What about other markup used in the same message with KUIT, e.g. Qt rich text?
Normally mixing of markups is never allowed, but in the special circumstances
we have, that is hardly a possibility. Even if this would be acceptable
on the face of it, there is the problem with already expanded
messages used as arguments to other messages (see point 1).

Current state in KDE is therefore to allow mixing of KUIT and Qt rich text
(presence of Qt rich text also forces output format, see point 3).
It helps that the set of tags are mostly disjunct, save for the KUIT <title>
tag, but this does not cause trouble because it anyway gets expanded to
Qt rich text <title>.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20110702/ea318879/attachment.sig>