new CSS parser
Lars Knoll
lars@trolltech.com
Wed, 15 Jan 2003 13:29:54 +0100
Hi,
I've now mostly finished my rewrite of the CSS parser in khtml. It seems to
work fine for me although I'm pretty sure it's still not bugfree.
Since not everyone knows why I've done this rewrite I'll state some of the
reasons here again. While doing a fine job for most real world web pages, the
old parser was not really standard compliant and we got a lot of critizism
about this on eg. the w3c style mailing list and people being involved in
CSS:
* it couldn't handle CSS character escapes
* it had trouble handling the forward compatible parsing rules as defined by
the standard (one example here are all the posted hacks to trick khtml into
ignoring rules)
* it didn't sort out invalid syntax correctly (properties and selectors are
eg. not allowed to start with a hyphen, something we used internally).
Some other problems I had with it was that the code slowly got rather
unmaintainable. The above problems were not fixable within the old code, and
would have resulted in adding hacks on top of hacks.
Because of this I started rewriting the CSS parser around christmas. It's
build more or less directly upon the syntax of CSS2.1 as found in the specs.
As a result (apart from being compliant to the standards), the amount of code
we need to maintain reduced significantly and got a lot simpler in certain
places.
The new code seems to still render all the old pages correctly. A few of the
CSS1 tests we failed before (forward compatible parsing, sec 7.1 and comments
sec 1.7) do now pass without a flaw. Like this I hope we can get more or less
all of the CSS tests working in the long run (and at least pass the CSS1 test
suite before too long).
A few changes I did at the same time:
* I removed the @-konq-quirks hack and favored a solution with two style
sheets: html4.css and quirks.css. This gives a clearer separation and cleaner
code in the CSSStyleSelector.
* All -konq-xxx properties and values now need to be correclty escaped for the
parser to recognise them: in CSS you have to write "\2d konq-xxx" instead of
"-konq-xxx" (and a second \ in C++ ;-). This is merely a consequence of the
parser being standards compliant now.
* I fixed the specificity calculation of the CSSSelector to correctly weight
"xxx#foo { ... }" rules compared to "xxx[id=foo]" as the standard mandates.
Some more items still need to be done:
* fix the weight of non CSS presentational hints in the style selector to be
CSS2.1 compliant
* check all properties in CSS2.1 for changes against the 2.0 specs
* check the parser for memory leaks in case of parsing errors
* reimplement the "quirky em" hack
* smaller code cleanups
Hope that gives you a small overview over the changes. It might be worth
switching safari over to the new code, but it is still rather new and surely
still not too tested very extensively.
Cheers,
Lars