[kde-guidelines] Serving Docbook [was The principle of guidelines.kde.org -- one step further]

Sun Oct 3 16:48:38 CEST 2004

Ok, here's some additional details on serving Docbook. 

Serving Docbook dynamically is tricky due to its construction. For example, 
transforming one part requires that /all/ sources are parsed, in order to 
resolve references and so forth. Modifications is done on a per-file basis -- 
it's transparent to XML that the document is divided into several physical 
chunks -- but to Docbook it's the whole document that matters. Hence, when 
one file is modified the whole document is invalidated on the higher level.

Cocoon does a good job in caching. For example, files are cached, XIncludes 
are cached, and XML/XSLs are cached. It keeps the DOM structure in memory 
ready to be used(and invalidates it when the file is modified). So, in our 
case when one source file is modified, it doesn't mean that all source files 
is reloaded, but that particular one is reloaded. When it comes to the actual 
transformation, all sources are always parsed, regardless of what changed.

In other words, there's caching on two levels: file and transformation. It's 
the latter which is troublesome because the sources are huge.

What does this mean in terms of speed? No idea :) For example, the Docbook 
XSLs are enormous(8mb) and these are always loaded and ready -- that should 
do a lot since there's no construction of the DOM. However, that the whole 
source needs to be parsed per transformation sounds bad, but on the other 
hand it only means that the first request is that slow( if it's noticeble).

I played around with Docbook XSL's rootid[1] parameter, and how that affected 
performance. As Bob Stayton's Docbook XSl book says, it didn't affect 
performance noticble. I tried on the Docbook'ed version on the HIG(it's 
larger than one thinks): A run with rootid set to a small part results in a 
run in 7 seconds, compared to the whole document which takes 11.

Apart from Cocoon, there's Forrest(also an apache project) which uses Cocoon. 
Here's some random info about it:

--
Forrest is designed with the new user in mind. Much effort has gone into 
making the process of generating a new site easy and simple.

By separating content from presentation, providing content templates and 
pre-written skins, Forrest is unequalled at enabling content producers to get 
their message out fast. This separation of concerns makes Forrest excellent 
to publish project documentation (notably software projects), intranets, and 
home pages, and anything else you can think of.  

Unique amongst comparable documentation tools, Forrest generates sites that 
can run both interactively as a dynamic web application, or as statically 
rendered pages. 

This provides a path for site growth: start off small and static, and if 
dynamic features (user login, forms processing, runtime data, site search 
etc) are one day needed, these can be accommodated by switching to webapp 
mode. 

Running as a webapp has a major advantage during development: content can be 
written, and then the rendered output viewed almost instantly in a web 
browser. This webapp technique enables Forrest's edit/review cycle to be 
faster than command-line transformation tools. 
--

It is basically cocoon pragmatically tailored for documentation frameworks. 
People use it for Docbook serving, apparently. It is exactly what we need, 
and it would be practical if we could piggyback an existing powerful 
solution. It also sounds to be in level, in terms of features, with what we 
need -- be it static of dynamic generation.

But there's still the speed aspect -- it does in large terms sound nice, but 
what it actually means is yet unknown. I also have other wonderings, for 
example how navigation would be worked out/integrated. Whether forrest/cocoon 
is to be used is still unclear, although it would surprise me if it wouldn't. 

I will investigate this properly, and next time it will be with a working test 
setup/prototype so we have dead clear results -- we don't want to use a fancy 
monster with tentacles just because it is fancy, have tentacles, and is a 
monster. It will last until that happens though, I will probably do it when I 
start cleaning developer.kde.org(it will be ready till the guidelines' 
primetime, so I have plenty of time :).

So what happens now? AFAICT, not much: www/areas/developer is used instead of 
www/areas/guidelines, and we should avoid investing too much in navigation 
for guidelines.kde.org -- standard Docbook should be fine.

Cheers,

		Frans

1.
rootid can be set to an id for an element, and then only that part is written. 
All sources are still parsed however, and any differences in processing time 
is hence caused by that processor's serialization(writing the file) have been 
reduced.