even more on kconfig escapes (Re: KDE/kdelibs/kdeui/icons)

Thu Nov 22 06:49:01 GMT 2007

On 11/21/07, Oswald Buddenhagen <ossi at kde.org> wrote:
> [moved to core-devel]
> [warning: lots of reading ahead :)]
>
> On Wed, Nov 21, 2007 at 11:42:10AM +0100, David Faure wrote:
> > On Wednesday 21 November 2007, Andreas Pakulat wrote:
> > > I don't understand why creating groups with slashes got broken either. I
> > > suspect that "somewhere" in kconfigs code KConfigGroup::name() is used
> > > where instead the fullname is needed.
> >
> probably. will you have a look? or thomas?
> (that's independend of the rest here, i'd think)

I'll look at this tonight. But, I haven't been paying much attention
to this, so I'm not even sure what got broken yet.

>
> > So now I'm confused. If this KConfig behavior change wasn't expected,
> > why the kicontheme.cpp change?  The bug should be fixed in KConfig,
> > not in the KIconTheme user code...
> >
> prolly. but it isn't *that* simple: :)
>
> there are two cases where slashes appear:
> - some hierarchy, like in the icons case
> - just as part of the name, like in an url (ok, an url *is* hierarchic,
>   but in this case one should consider the structure opaque)
>
> in the first case, the slashes should appear verbatim in the group
> header. the proper way to construct such a hierarchy is by using nested
> kconfiggroups - see andreas' fix.
>
> in the second case, the slashes should be escaped somehow. the way to
> construct such a thing would be simply passing the complete name to the
> kconfiggroup ctor.
>
> that this distinction makes sense becomes obvious when you consider
> mixing the two.
> and when you consider alternative backends, it becomes obvious why the
> separator between hierarchy levels needs to be opaque to the api.
>
> so far the theory. now the practical consequences:
>
> - api. functions must treat group names as atomic, not as hierarchical:
>   - they did so far and it's too late to change now
>   - it would be just wrong anyway, as the separator needs to be opaque
>
> - on-disk format of ini files. in the current format, groups are encoded
>   just like keys: \-escape leading and trailing whitespace, control
>   chars, [, ] and =. taking " weird \ []] group / name=blah" as an
>   example yields [\sweird \\ \x5b\x5d\x5d group / name\x3dblah].
>   manually inserted [ and ] would break parsing. an = would be taken
>   literally. (*)
>
>   now we need some way to delimit hierarchy levels. proposals range from
>   the not-so-fortunate / (current), the probably better | (apaku) or the
>   even better ^ (me :). this all doesn't really *solve* the problem,
>   though - it merely lessens the probability of it surfacing. same game

Sorry about my unfortunate choice of delimiter, I should have known
this would cause problems sooner or later. But, I figured if QSettings
used it it should be OK, I should have looked at more KDE configs, and
not assumed if Qt used it, it would be OK.
So I propose to use an ASCII control char that I've never heard of,
and is probably a holdover from some old hardware. It is named
appropriately for this ASCII char 0x1d (Group Separator). I would
guess it is *extremely* unlikely to be used in any group names,
definitely less likely than any normal printable character. IMHO it
would be simplest since like you said users shouldn't depend on the
separator, and it would only need to be changed in 1 or 2 places in
KConfigGroupPrivate. But, it would break any current configs that have
subgroups, but I don't see a way around that.

>   as with list separators in values before ... so - again - two choices
>   with various variations each. example of the hour: "g1/1" with
>   subgroup "g2^2[\":
>   - add an additional layer of encoding. using / for the level
>     delimiter:
>     - using the same escape char as the underlying layer:
>       [g1\\/1/g2^2\x5b\\\\]
>     - using a different escape char:
>       [g1^/1/g2^^2\x5b\\]
>   - escape separator at the lowest layer already:
>     - pick an arbitrary char ("by pure chance, it's the slash"):
>       [g1\x2f1/g2^2\x5b\\]
>     - use a char that currently really cannot appear, as it is already
>       encoded:
>       [g1/1=g2^2\x5b\\]
>       this simply redefines "needlessly encoded in the case of groups"
>       into "reserved for later use". :-D
>   - actually, there is a third approach i haven't thought of before:
>     make the separator itself an escape at the lowest layer:
>     [g1/1\/g2^2\x5b\\]
>
>   evaluating the proposals:
>   - additional layer:
>     - both variants look somewhat ugly, the first being particularly
>       unreadable (what coincidence that this is the encoding used for
>       list values :). using | for the delimiter would reduce the number
>       of to-be-escaped cases. otoh, who cares? :}
>     - coding-wise, this is the simpler solution: one could store the
>       list-encoded key in the entry map, thus needing changes only to
>       the code for constructing groups and returning their name.
>     - to have forward compatibility, groups with this encoding would
>       need to gain a [$h] (for "hierarchical") marker. limited backward
>       compat would be achieved by encoding only nested groups that way.
>   - lowest layer:
>     - arbitrary char:
>       - a slash or bar looks sort of most readable to me
>       - that needs the [$h] marker, too
>     - equal sign:
>       - looks just "unnatural"
>       - but needs no [$h] marker. provided we define the end of may 2007
>         as the beginning of times, but that's reasonable within the
>         constraints (*).
>     - code-wise, one could re-encode this into the list-encoded format
>       and handle it like the other case. in any case it's more code.
>       the proper solution (for either case) would be a really
>       hierarchical entry map, but that's out of scope for 4.0.
>   - lowest layer escape:
>     - looks somewhat ... backwards
>     - needs no [$h] marker
>     - code-wise it's about the same as the previous variant
>
> (*) the current group name encoding is neither backward nor forward
> compatible with kde3:
> group names were written literally, with the exception of [ and ] being
> doubled, the example thus yielding [ weird \ [[]]]] group / name=blah].
> line breaks would break parsing, other control chars would make it
> unreadable for humans.
>
> i haven't seen complaints about the broken compatibility yet, but here
> are some thoughts nonetheless:
> - i don't think this is relevant for shared kde3/kde4 configs, as they
>   don't have weird group names, afaik
> - a one-time upgrade for actually affected apps would need some special
>   format rewrite stuff in kconf_upgrade: the regular kconfig would
>   obviously break down
> - consider this forward compatible solution: [ is effectively an escape
>   char. so something as perverted as "[new\nline ]\x13here" would
>   become [[[new[nline ]][x13here]. for more readability the backslash
>   could be preserved, making it [[[new[\nline ]][\x13here].
>
> assume we implement the forward compatible variant. again, use the
> example "g1/1" with subgroup "g2^2[\" and / for the delimiter, the
> encoding options are:
> - add an additional layer of encoding:
>   - using the same escape char as the underlying layer:
>     [g1[[/1/g2^2[[[[\]
>   - using a different escape char:
>     [g1^/1/g2^^2[[\]
> - escape separator at the lowest layer already:
>   [g1[/1/g2^2[[\]
> - make the separator itself an escape at the lowest layer:
>   [g1/g[/g2^2[[\]
>
> evaluation: same as above.
>
>
> and just in the moment i wanted to conclude the mail, i've got yet
> another idea ... how about encoding it like this:
> [g1/1][g2^2<whatever other encoding here>]
> a problem here is that the immutability marker is as such a valid group
> name and would thus create ambiguity (btw, this is also a problem for
> the stand-alone file immutability marker). one could remedy this by
> encoding a leading $ in group names as [$.
>
> concluding questions (yeah, finally ... ;):
> - should we restore backwards compat?
> - which separator encoding to chose?
>   - i tend to favor my last-minute idea even if i spent only five
>     minutes developing it, as opposed to five hours on the rest. ;)
>   - second option would be the "regular" lowest-layer encoding (without
>     the = hack in the non-backward-compatible variant). i guess i'd
>     favor / over |, but i'm undecided.
>
> whew, things take much less time when you don't try to consider
> everthing. ;)

I would favor the simplest first, no reason to over-engineer it, use
an extremely unlikely char like my proposal. Then if that isn't good
enough try one of the other alternatives, even though I don't know
which one would be least ugly code-wise and file-wise :) Though your
last idea seems interesting.