new CSS parser

Lars Knoll lars@trolltech.com
Wed, 15 Jan 2003 13:29:54 +0100


Hi,

I've now mostly finished my rewrite of the CSS parser in khtml. It seems to 
work fine for me although I'm pretty sure it's still not bugfree.

Since not everyone knows why I've done this rewrite I'll state some of the 
reasons here again. While doing a fine job for most real world web pages, the 
old parser was not really standard compliant and we got a lot of critizism 
about this on eg. the w3c style mailing list and people being involved in 
CSS:

* it couldn't handle CSS character escapes
* it had trouble handling the forward compatible parsing rules as defined by 
the standard (one example here are all the posted hacks to trick khtml into 
ignoring rules)
* it didn't sort out invalid syntax correctly (properties and selectors are 
eg. not allowed to start with a hyphen, something we used internally).

Some other problems I had with it was that the code slowly got rather 
unmaintainable. The above problems were not fixable within the old code, and 
would have resulted in adding hacks on top of hacks.

Because of this I started rewriting the CSS parser around christmas. It's 
build more or less directly upon the syntax of CSS2.1 as found in the specs. 
As a result (apart from being compliant to the standards), the amount of code 
we need to maintain reduced significantly and got a lot simpler in certain 
places.

The new code seems to still render all the old pages correctly. A few of the 
CSS1 tests we failed before (forward compatible parsing, sec 7.1 and comments 
sec 1.7) do now pass without a flaw. Like this I hope we can get more or less 
all of the CSS tests working in the long run (and at least pass the CSS1 test 
suite before too long).

A few changes I did at the same time:

* I removed the @-konq-quirks hack and favored a solution with two style 
sheets: html4.css and quirks.css. This gives a clearer separation and cleaner 
code in the CSSStyleSelector.
* All -konq-xxx properties and values now need to be correclty escaped for the 
parser to recognise them: in CSS you have to write "\2d konq-xxx" instead of 
"-konq-xxx" (and a second \ in C++ ;-). This is merely a consequence of the 
parser being standards compliant now.
* I fixed the specificity calculation of the CSSSelector to correctly weight 
"xxx#foo { ... }" rules compared to "xxx[id=foo]" as the standard mandates.

Some more items still need to be done:

* fix the weight of non CSS presentational hints in the style selector to be 
CSS2.1 compliant
* check all properties in CSS2.1 for changes against the 2.0 specs
* check the parser for memory leaks in case of parsing errors
* reimplement the "quirky em" hack
* smaller code cleanups

Hope that gives you a small overview over the changes. It might be worth 
switching safari over to the new code, but it is still rather new and surely 
still not too tested very extensively.

Cheers,
Lars