[kde-guidelines] The principle of guidelines.kde.org -- one step further

Tue Sep 28 21:03:05 CEST 2004

On Tuesday 28 September 2004 16:51, Lauri Watts wrote:
> On Tuesday 28 September 2004 18.12, Frans Englich wrote:
> > On Tuesday 28 September 2004 04:08, Aaron J. Seigo wrote:
> > > On Saturday 25 September 2004 10:07, Frans Englich wrote:
> > > > What do people think?
> > >
> > > this is a vision that i can get behind. i'd still like to commit to
> > > www/areas/guidelines for now since that's already set up and lets us
> > > put a few drafts out before putting it elsewhere
>
> Assuming I understood correctly (leave developer.k.o where it is, create
> guidelines, and then work on moving the developer.k.o site into the
> areas/developer, in a coherent and managed structure, with "docbook inside"
> and then eventually grouping all developer information there - did I
> understand that right?)

Right on.

>
> What Aaron said.
>
> I've very often said that developer.kde.org pretty much needs to be taken
> out and shot though.   There's so much disorganised content there, and it's
> very difficult to find any of it.

Yes, KDE takes enormously many punches for this. Many questions on 
kde-devel(dcop, xmlgui) and similar is simply because the docs are not found; 
either hidden on developer.kde.org, or the CVS modules. It would also make it 
easier to infiltrate and joining KDE development.

>
> In fact, I'm willing to donate a server space to work on a rebuild.

Nice.

>
> First though, I need to go right now and finish up draft framework so that
> we have something for Aaron to look at (and maybe commit
>
> (boring technical talk from here on in)
>
> The devil is in the details, but the details don't need to be set in stone
> right now.

Indeed. I threw this idea out now so we had a vague sense of direction, and 
won't do work which would conflict with projects in the future, such as that 
a lot of time is invested in navigation for guidelines.kde.org, when it would 
have to be heavily reworked to take care of developer's content. I can't 
think of anything that would affect how we work right now(except the name of 
the cvs directory). So yes -- the details can be left. And it will be long 
before at least I start this, I got other KDE projects currently.

> I'm not in love with cocoon, specifically - I don't see it 
> solving anything that xsltproc and a cronned make job with some creative
> stylesheets can't solve.  I could be biased here, but my experiences with
> cocoon just haven't been very pretty (and that sun resolver stuff is just
> nasty nasty evil to work with), while I have seen websites bigger than ours
> that build themselves other ways.
>
> I do dislike too much dynamism in a site that is inherently quite static -
> there are many pages on developer.k.o that in reality don't and won't
> change from year to year.  Right now the problem is that you can't *find*
> the content, not that it's changing too fast to keep up with.
>
> So long as all the necessary files are created on a regular basis, a cron
> job to run an update on any files that got a cvs checkin would minimise the
> load. Note that docs.kde.org does *not* run this way (it runs the entire
> make docs script on the entire docs repository, for mainly historic
> reasons) and it can totally bring the webserver to it's knees. Since it's
> the same webserver as developer.k.o, I can take a guess how well it's
> admins will like us loading it up with more stuff.  (Can you say "not very
> much?")  Anyway, I'm not just guessing on the kind of load this type of
> work will generate, unfortunately, it's enormous.  XSLT processing, no
> matter what the processor, is highly intensive, doing as little as possible
> of it, and only when required is a good goal for a webserver admin.
>
> The FreeBSD Doc's (and the rest of FreeBSD's site) are sort of
> semi-static/semi-dynamic - they are XML built, and created from an XSLT
> transform.  Much of it is docbook backed, and those parts generate chunked
> up html, "all in one" html, pdf, ps, rtf, and even pdb files to read on
> your palm pilot.  They don't do it dynamically every time you click a link,
> but they do run every few hours, (and could easily be taught to only
> regenerate docs that have changed.)    They do it using the infamously
> just-about-as-hard-to-work-with-as-sun-resolver Jade, but we don't have to.

We do by no doubt have to consider the performance aspect. Our docbook sources 
will be huge, the Docbook XSLT is also enormous, and the site itself will 
have a relatively high amount of constant visitors, with slashdots once in a 
while. We need good performance, and it's critical, no doubt.

I agree with Thomas, I don't think there's a trouble with Cocoon because it's 
Java as long we don't hit the memory roof(of course). Java was once slow, and 
there's apps which are written slow, but I think Java in itself s fine. Also, 
Java is used in many large website projects(as example to that it works, 
performance wise).

The question whether Cocoon is suitable, is on how it handles the 
transformations. We would have references over probably 600< pages -- is all 
those read, the docbook XSLTs read, and all the output files written, for 
each HTTP request? :)

How well the transformation is optimized doesn't have to be black and white, 
it can be cached/not-cached in various levels and still be good enough.

For example, these different steps could be cached/optimized or not: The 
loading of the Docbook XSLs; loading the document sources; writing only what 
is needed; caching the build of the document; caching generated content. 
There's some combinations that would work, others wouldn't. I'm investigating 
this, and I will deliver a definite answer which we can continue from.

Doing a Makefile/xsltproc/cron solution definitely has its merits, of course. 
However, that one could also break performance. For example, if a script runs 
every half our and generates chunked HTML and PDF output(that's _a lot_ of 
files), that would mean an enormous load under quite a long period, and 
delayed updates is disturbing for those who work. Yes, it could be nice'd, 
but then it could take even more time if there's a load on the web server.

However, if requests are dynamically transformed, and the output cached, that 
would lead to a linear load, where often visited pages loads "statically" 
while those less often visited have a slight longer loading time at the first 
request. Perhaps that would be better -- a linear load, and updates instantly 
available. IIRC, slashdot serves dynamic files, but it is ok because it's 
cached and the load comes first for rarely visited files.

A risk with going for the more simple solution is that it becomes too simple. 
For example, if we want a PDF icon up at right corner of every page, that 
contains the currently viewed section(chunk), it would be very practical and 
boost the site's usefulness, but if the generation is done statically, it 
would mean a lot of files at once, and that's not fast. The result could then 
be that features was cut down.

The site will be big and complex, for example, we will perhaps have a news 
engine. It's possible that the site's other needs, other than docbook, are 
dynamic and Cocoon would be useful anyway. In other words, the general 
flexibility and power Cocoon provides would perhaps be needed in other cases. 

Regarding resources; such a site is very important for KDE and have a large 
impact on the productivity in the long term, so I don't think it's difficult 
to find a sponsor for a new server, if that's what we need. Also, if what we 
need is a reasonable server that's not already bogged down, that is what we 
should demand(that's the problem), and not compromise what we want to 
achieve.

In other words:

* We need to know a little bit better what we actually will do :)

* We need to know the specific details on Cocoon combined which a huge Docbook 
project; I'll post when I've recieved some replies, and know all the details.

>
> Finally there's also the advantage that a typo in say, a stylesheet, or an
> inadvertantly invalid doc, won't make something inaccessible, since the
> previous iteration is still there.

In either case, I had thought we could let a cvs commit script do validation. 
In this way only clean Docbook sources gets through.

Cheers,

		Frans