[kde-guidelines] The principle of guidelines.kde.org -- one step further

Lauri Watts lauri at kde.org
Wed Sep 29 00:42:39 CEST 2004


On Tuesday 28 September 2004 21.03, Frans Englich wrote:
> On Tuesday 28 September 2004 16:51, Lauri Watts wrote:
> > On Tuesday 28 September 2004 18.12, Frans Englich wrote:
> We do by no doubt have to consider the performance aspect. Our docbook
> sources will be huge, the Docbook XSLT is also enormous, and the site
> itself will have a relatively high amount of constant visitors, with
> slashdots once in a while. We need good performance, and it's critical, no
> doubt.
>
> I agree with Thomas, I don't think there's a trouble with Cocoon because
> it's Java as long we don't hit the memory roof(of course). Java was once
> slow, and there's apps which are written slow, but I think Java in itself s
> fine. Also, Java is used in many large website projects(as example to that
> it works, performance wise).

Yeah, but when it was slow, it was slooooow.  That's about when I last touched 
the stuff :)

> The question whether Cocoon is suitable, is on how it handles the
> transformations. We would have references over probably 600< pages -- is
> all those read, the docbook XSLTs read, and all the output files written,
> for each HTTP request? :)

I had an interesting talk with some of the php-doc people quite a while ago, 
and can't quite remember what they used, but I'll try dig it up and send it 
your way.  I vaguely recall they have a most interesting solution in place 
for their sites though.

> For example, these different steps could be cached/optimized or not: The
> loading of the Docbook XSLs; loading the document sources; writing only
> what is needed; caching the build of the document; caching generated
> content. There's some combinations that would work, others wouldn't. I'm
> investigating this, and I will deliver a definite answer which we can
> continue from.

There's also the option of having a reverse-proxy in front of the entire 
thing.  My partner happens to be rather an expert in that field, so I'll ask 
him what he thinks (and what we can do with open source stuff, since his 
stock answer will be "put a cisco in there" but I know for a fact he's done 
some really big reverse-proxy setups with squid :)

> A risk with going for the more simple solution is that it becomes too
> simple. For example, if we want a PDF icon up at right corner of every
> page, that contains the currently viewed section(chunk), it would be very
> practical and boost the site's usefulness, but if the generation is done
> statically, it would mean a lot of files at once, and that's not fast. The
> result could then be that features was cut down.

So long as the processing is limited to "things which changed since the last 
run", then any solution will be fairly light on resources, and at least 
lighter than dynamic generation.  So, I would like to see that as the 
ultimate goal (any page hit recieves an already created page/chunk/pdf, 
whatever, nothing generated in response to browser requests - but everything 
up to date as it changes)

I can see if I can get some load and logs numbers from claudiu - i18n.kde.org 
generates pdf, ps, and a couple of variants of html, for a few fairly large 
docs (see http://i18n.kde.org/sitedoc.html ), about the same size as say, 
each of these guidelines, and it does it whenever there is a cvs checkin.  
It's a php script + a slightly hacked meinproc, and meinproc is just a 
slightly hacked xsltproc, so the numbers would give something solid to 
extrapolate from to compare to any cocoon figures you can come up with from 
your sources.

I will probably harp on the load issue until you're insane with it, but really 
- docs.k.o doing a complete doc regeneration has been known to go seriously 
sideways and spike a load into the 3 figures, we really really want to avoid 
that if possible :)

Claudiu's scripts are also available for our use, so they might be something 
to base an initial revision on, to actually get things working (since they're 
already there and, well, they do work) 

> > Finally there's also the advantage that a typo in say, a stylesheet, or
> > an inadvertantly invalid doc, won't make something inaccessible, since
> > the previous iteration is still there.
>
> In either case, I had thought we could let a cvs commit script do
> validation. In this way only clean Docbook sources gets through.

Validity isn't the only problem unfortunately.  Consider, I have a perfectly 
valid document, and a perfectly valid xslt file right now, which just flat 
out refuses to put headers in any sections.  It's probably due to a silly 
typo or a missing rule, but I can't find it (yet), and it's a one line change 
to fix a bug in the *previous* version of the xslt that got it this way.  
This stuff is extremely robust when it's all finished, but getting it there 
is sometimes a bit fragile :)

Regards,
-- 
Lauri Watts
KDE Documentation: http://docs.kde.org
KDE on FreeBSD: http://freebsd.kde.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://mail.kde.org/pipermail/kde-guidelines/attachments/20040929/c6dc7104/attachment.pgp


More information about the kde-guidelines mailing list