[kde-guidelines] Serving Docbook [was The principle of
guidelines.kde.org -- one step further]
frans.englich at telia.com
Sun Oct 3 16:48:38 CEST 2004
Ok, here's some additional details on serving Docbook.
Serving Docbook dynamically is tricky due to its construction. For example,
transforming one part requires that /all/ sources are parsed, in order to
resolve references and so forth. Modifications is done on a per-file basis --
it's transparent to XML that the document is divided into several physical
chunks -- but to Docbook it's the whole document that matters. Hence, when
one file is modified the whole document is invalidated on the higher level.
Cocoon does a good job in caching. For example, files are cached, XIncludes
are cached, and XML/XSLs are cached. It keeps the DOM structure in memory
ready to be used(and invalidates it when the file is modified). So, in our
case when one source file is modified, it doesn't mean that all source files
is reloaded, but that particular one is reloaded. When it comes to the actual
transformation, all sources are always parsed, regardless of what changed.
In other words, there's caching on two levels: file and transformation. It's
the latter which is troublesome because the sources are huge.
What does this mean in terms of speed? No idea :) For example, the Docbook
XSLs are enormous(8mb) and these are always loaded and ready -- that should
do a lot since there's no construction of the DOM. However, that the whole
source needs to be parsed per transformation sounds bad, but on the other
hand it only means that the first request is that slow( if it's noticeble).
I played around with Docbook XSL's rootid parameter, and how that affected
performance. As Bob Stayton's Docbook XSl book says, it didn't affect
performance noticble. I tried on the Docbook'ed version on the HIG(it's
larger than one thinks): A run with rootid set to a small part results in a
run in 7 seconds, compared to the whole document which takes 11.
Apart from Cocoon, there's Forrest(also an apache project) which uses Cocoon.
Here's some random info about it:
Forrest is designed with the new user in mind. Much effort has gone into
making the process of generating a new site easy and simple.
By separating content from presentation, providing content templates and
pre-written skins, Forrest is unequalled at enabling content producers to get
their message out fast. This separation of concerns makes Forrest excellent
to publish project documentation (notably software projects), intranets, and
home pages, and anything else you can think of.
Unique amongst comparable documentation tools, Forrest generates sites that
can run both interactively as a dynamic web application, or as statically
This provides a path for site growth: start off small and static, and if
dynamic features (user login, forms processing, runtime data, site search
etc) are one day needed, these can be accommodated by switching to webapp
Running as a webapp has a major advantage during development: content can be
written, and then the rendered output viewed almost instantly in a web
browser. This webapp technique enables Forrest's edit/review cycle to be
faster than command-line transformation tools.
It is basically cocoon pragmatically tailored for documentation frameworks.
People use it for Docbook serving, apparently. It is exactly what we need,
and it would be practical if we could piggyback an existing powerful
solution. It also sounds to be in level, in terms of features, with what we
need -- be it static of dynamic generation.
But there's still the speed aspect -- it does in large terms sound nice, but
what it actually means is yet unknown. I also have other wonderings, for
example how navigation would be worked out/integrated. Whether forrest/cocoon
is to be used is still unclear, although it would surprise me if it wouldn't.
I will investigate this properly, and next time it will be with a working test
setup/prototype so we have dead clear results -- we don't want to use a fancy
monster with tentacles just because it is fancy, have tentacles, and is a
monster. It will last until that happens though, I will probably do it when I
start cleaning developer.kde.org(it will be ready till the guidelines'
primetime, so I have plenty of time :).
So what happens now? AFAICT, not much: www/areas/developer is used instead of
www/areas/guidelines, and we should avoid investing too much in navigation
for guidelines.kde.org -- standard Docbook should be fine.
rootid can be set to an id for an element, and then only that part is written.
All sources are still parsed however, and any differences in processing time
is hence caused by that processor's serialization(writing the file) have been
More information about the kde-guidelines