KHTML Paged Media - Status Report

Allan Sandfeld Jensen kde at carewolf.com
Sat Aug 20 12:31:57 CEST 2005


Hi

I've noticed a few status reports here, and would just like to add mine to it.

GROUND-WORK:
I started the project by thorough reading of the spec. and small experiments 
to figure out what was possible. 

First it should be noted that the current implementation of pagination in 
KHTML is well... wrong. It paints the document unpaged, and then just tries 
to chose good places to cut. This leads to many poor cuts, and means that 
forced page-breaks (such as page-break-after: always) makes a full premature 
break across the whole page. 

The first step of fixing printing in KHTML was rewriting all page-break logic, 
and move page-break decisions from painting time to layout time. This makes 
it possible to move multiple block and move them different distances.

I considered a wide range of implementations, with the first goal of 
paginating the entire document during one layout. It turned out there was 
many problems with such a solution. I've written a little framework for it, 
but ultimately abandoned the idea, at least until I have layout/page-breaking 
of one-page-at-time working.

Before starting to mess too much with the block layout I first ported a major 
clean-up from WebCore, making sure our code bases was as similar as possible 
so that my project will not increase the split.

IMPLEMENTATION:
I then removed all the old truncation code, and put in page-break decisions in 
the layout of blocks. I've written two different page-break logics, one for 
blocks containing block-children and one for blocks containing inline 
children. 

The block-children code is simple. It assumes it is the responsibility of the 
children to split themselves, setting a flag if they succeeds. If the parent 
discovers a child that crosses a page-break but has not set a flag, it will 
attempt to move the child below the page-break. It will do the same for block 
that have set CSS forced page-break. 

To handle page-break-*: avoid, I decided to introduce new anonymous blocks 
that contains runs of children that "avoids" page-breaks between them and set 
page-break-inside: avoid. With this being done before layout, the only CSS 
the layout has to handle is page-break-after/before: always and 
page-break-inside: avoid. This means it can be done progressively.

Page-breaking inline children is in the simple form done much similar except 
that lines are always assumed not to break themselves, and always cleared if 
crossing page-breaks. Handling the CSS orphans and widows turned out to make 
it harder though. Violating orphans is simple just don't break and let the 
parent move the block across the page-break. Violation of widows is much 
harder. Since we layout one line at a time, it is impossible to know if we 
will later violate widows when we encounter a page-break. The solution so far 
have been to postpone the widows check until all lines have been layouted. If 
there is a violation I then set a hint to what line _should_ have been 
broken, and redo the layout.


With the new page-break code implemented. I've spend much of the remaining 
time to fix corner-cases and bugs in the implementation. At this point tables 
are still generating many page-break bugs, and many objects are not cropped 
correctly at page-breaks.

STANDARD TROUBLE:
Another issue I've been studying is the DPI question. Previously we have been 
using 72DPI, being lower than most screen DPI this generate larger fonts than 
seen on the screen. It seems that for good rendering of most webpages DPI 
should be assumed constant (at 96). This has the consequence of requiring a 
dppx (dots per px), making the CSS px value an abstract size.

The last ironic standard problem are websites importing their style-sheets 
with a "screen" media selector. This means the style-sheet doesn't apply for 
"print" media, basically producing an unstyled webpage. It appears to support 
broken web-sites a "screen" media selector should be treated as "all".

I've decided to expand CSS 2.1 as well. For presentations and other non-print 
paged media it just not good enough to use a "@media print" selector. I am 
experimenting with adding media-groups as valid selectors making "@media 
paged" and "@media static" possible.


WORK-IN-PROGRESS:
Besides bug-fixes. I am working on parsing and applying page-context CSS 
"@page", and I need to look into how to handle FRAMESET documents and 
documents with IFRAMEs. 

`Allan


More information about the Kde-soc mailing list