even more on kconfig escapes (Re: KDE/kdelibs/kdeui/icons)

Oswald Buddenhagen ossi at kde.org
Wed Nov 21 18:29:44 GMT 2007

[moved to core-devel]
[warning: lots of reading ahead :)]

On Wed, Nov 21, 2007 at 11:42:10AM +0100, David Faure wrote:
> On Wednesday 21 November 2007, Andreas Pakulat wrote:
> > I don't understand why creating groups with slashes got broken either. I
> > suspect that "somewhere" in kconfigs code KConfigGroup::name() is used
> > where instead the fullname is needed.
probably. will you have a look? or thomas?
(that's independend of the rest here, i'd think)

> So now I'm confused. If this KConfig behavior change wasn't expected,
> why the kicontheme.cpp change?  The bug should be fixed in KConfig,
> not in the KIconTheme user code...
prolly. but it isn't *that* simple: :)

there are two cases where slashes appear:
- some hierarchy, like in the icons case
- just as part of the name, like in an url (ok, an url *is* hierarchic,
  but in this case one should consider the structure opaque)

in the first case, the slashes should appear verbatim in the group
header. the proper way to construct such a hierarchy is by using nested
kconfiggroups - see andreas' fix.

in the second case, the slashes should be escaped somehow. the way to
construct such a thing would be simply passing the complete name to the
kconfiggroup ctor.

that this distinction makes sense becomes obvious when you consider
mixing the two.
and when you consider alternative backends, it becomes obvious why the
separator between hierarchy levels needs to be opaque to the api.

so far the theory. now the practical consequences:

- api. functions must treat group names as atomic, not as hierarchical:
  - they did so far and it's too late to change now
  - it would be just wrong anyway, as the separator needs to be opaque

- on-disk format of ini files. in the current format, groups are encoded
  just like keys: \-escape leading and trailing whitespace, control
  chars, [, ] and =. taking " weird \ []] group / name=blah" as an
  example yields [\sweird \\ \x5b\x5d\x5d group / name\x3dblah].
  manually inserted [ and ] would break parsing. an = would be taken
  literally. (*)
  now we need some way to delimit hierarchy levels. proposals range from
  the not-so-fortunate / (current), the probably better | (apaku) or the
  even better ^ (me :). this all doesn't really *solve* the problem,
  though - it merely lessens the probability of it surfacing. same game
  as with list separators in values before ... so - again - two choices
  with various variations each. example of the hour: "g1/1" with
  subgroup "g2^2[\":
  - add an additional layer of encoding. using / for the level
    - using the same escape char as the underlying layer:
    - using a different escape char:
  - escape separator at the lowest layer already:
    - pick an arbitrary char ("by pure chance, it's the slash"):
    - use a char that currently really cannot appear, as it is already
      this simply redefines "needlessly encoded in the case of groups"
      into "reserved for later use". :-D
  - actually, there is a third approach i haven't thought of before:
    make the separator itself an escape at the lowest layer:
  evaluating the proposals:
  - additional layer:
    - both variants look somewhat ugly, the first being particularly
      unreadable (what coincidence that this is the encoding used for
      list values :). using | for the delimiter would reduce the number
      of to-be-escaped cases. otoh, who cares? :}
    - coding-wise, this is the simpler solution: one could store the
      list-encoded key in the entry map, thus needing changes only to
      the code for constructing groups and returning their name.
    - to have forward compatibility, groups with this encoding would
      need to gain a [$h] (for "hierarchical") marker. limited backward
      compat would be achieved by encoding only nested groups that way.
  - lowest layer:
    - arbitrary char:
      - a slash or bar looks sort of most readable to me
      - that needs the [$h] marker, too
    - equal sign:
      - looks just "unnatural"
      - but needs no [$h] marker. provided we define the end of may 2007
	as the beginning of times, but that's reasonable within the
	constraints (*).
    - code-wise, one could re-encode this into the list-encoded format
      and handle it like the other case. in any case it's more code.
      the proper solution (for either case) would be a really
      hierarchical entry map, but that's out of scope for 4.0.
  - lowest layer escape:
    - looks somewhat ... backwards
    - needs no [$h] marker
    - code-wise it's about the same as the previous variant

(*) the current group name encoding is neither backward nor forward
compatible with kde3:
group names were written literally, with the exception of [ and ] being
doubled, the example thus yielding [ weird \ [[]]]] group / name=blah]. 
line breaks would break parsing, other control chars would make it
unreadable for humans.

i haven't seen complaints about the broken compatibility yet, but here
are some thoughts nonetheless:
- i don't think this is relevant for shared kde3/kde4 configs, as they
  don't have weird group names, afaik
- a one-time upgrade for actually affected apps would need some special
  format rewrite stuff in kconf_upgrade: the regular kconfig would
  obviously break down
- consider this forward compatible solution: [ is effectively an escape
  char. so something as perverted as "[new\nline ]\x13here" would
  become [[[new[nline ]][x13here]. for more readability the backslash
  could be preserved, making it [[[new[\nline ]][\x13here].
assume we implement the forward compatible variant. again, use the
example "g1/1" with subgroup "g2^2[\" and / for the delimiter, the
encoding options are:
- add an additional layer of encoding:
  - using the same escape char as the underlying layer:
  - using a different escape char:
- escape separator at the lowest layer already:
- make the separator itself an escape at the lowest layer:

evaluation: same as above.

and just in the moment i wanted to conclude the mail, i've got yet
another idea ... how about encoding it like this:
[g1/1][g2^2<whatever other encoding here>]
a problem here is that the immutability marker is as such a valid group
name and would thus create ambiguity (btw, this is also a problem for
the stand-alone file immutability marker). one could remedy this by
encoding a leading $ in group names as [$.

concluding questions (yeah, finally ... ;):
- should we restore backwards compat?
- which separator encoding to chose?
  - i tend to favor my last-minute idea even if i spent only five
    minutes developing it, as opposed to five hours on the rest. ;)
  - second option would be the "regular" lowest-layer encoding (without
    the = hack in the non-backward-compatible variant). i guess i'd
    favor / over |, but i'm undecided.

whew, things take much less time when you don't try to consider
everthing. ;)

Hi! I'm a .signature virus! Copy me into your ~/.signature, please!
Chaos, panic, and disorder - my work here is done.

More information about the kde-core-devel mailing list