Problems with infrastructure

Sun Dec 21 13:01:45 GMT 2014

On Friday, 19 December 2014 22:16:36 CEST, Scarlett Clark wrote:
> Jenkins is compatible 
> and works with Gerrit, so I don't understand why another CI is being 
> considered.

Because when I started this effort this spring, build.kde.org appeared to 
be on life support. I also wanted to expand the number of tools I 
understand, and make sure that I can evaluate various CI tools without a 
baggage of having to stick with a particular CI solution "just because 
we've always done it that way". That's why I started looking at various 
tools, and the whole stack which the OpenStack infrastructure team have 
built [1] looked extremely compelling (note: they still use Jenkins).

The killer feature for me was their support for testing everything, where 
each and every commit that is going to land in a repository is checked to 
make sure it doesn't introduce any regressions. Not on a per-push basis, 
but on a per-commit basis. This is something which has bitten me in the 
past where I occasionally introduced commit series which contained 
occasional breakage in the middle, only to be corrected in a subsequent 
commit. That was bad because it breaks `git bisect` when one starts looking 
for errors discovered in future, by unrelated testing.

At the same time, doing per-commit tests means that these tests have to run 
in parallel, and that there must be $something which controls this 
execution. This is well-described at [2], so I would encourage you to read 
this so that you understand what the extra functionality is. To the best of 
my knowledge, Jenkins cannot do that. Based on your level of experience 
with Jenkins, do you think it's possible with it alone? Would that also 
possible on a cross-repository manner?

Now that we've established a necessity to use something extra to control 
this logic of job executions, we still have some use for Jenkins of course. 
Something has to serve as an RPC tool for actually scheduling the build 
jobs on various servers/VMs/workers/chroots/dockers/whatever. The question 
is whether it still makes sense to use Jenkins at that point given that the 
nice features, such as being able to track the history of the number of 
failing tests, having pretty dashboard pointing to faulty commits etc etc 
on one hand and having to create a ton of XML files with build job 
definitions on the other hand. Does Jenkins still provide a net positive 
gain?

The system which I was building had no need for drawing graphs of compiler 
warnings per project throughout past two months. What I wanted to have was 
an efficient system which will report back to the rest of the CI 
infrastructure the result of a build of a proposed change to help keep a 
project's quality up to a defined level. The only use of Jenkins in that 
system would be for remotely triggering build jobs, and somehow pushing the 
results back. I do not think that going through all of the Jenkins 
complexity is worth the effort in that particular use case.

BTW, the way in which KDE uses Jenkins right now does not really make use 
of many Jenkins functions. The script which prepares the git tree is 
custom. The management of build artifacts is reimplemented by hand as well. 
In fact, most of the complexity is within the Python scripts which control 
the installation of dependencies, mapping of config options into cmake 
arguments etc. These are totally Jenkins-agnostic, and would work just as 
well if run by hand. That's why I'm using them in the CI setup I deployed. 
Thanks for keeping these scripts alive.

So in the end, I had a choice of either using Jenkins only to act as a dumb 
transport of commands like "build KIO's master branch" and responses such 
as "build is OK", or bypassing Jenkins entirely and using $other_system. If 
the configuration of the $other_system is easier than Jenkins', then it 
seemed to be a good idea to try it out. Because I was already using Zuul 
(see the requirement for doing "trunk gating" and speculative execution of 
the dependant jobs as described in [2]), and Zuul uses Gearman for its 
RPC/IPC/messagebus needs, something which just plugs into Gearman made a 
lot of sense. And it turned out that there is such a project, the 
Turbo-Hipster thing. I gave it a try. I had to fix a couple of bugs and add 
some features (SCPing the build logs, for one), and I'm quite happy with 
the result.

But please keep in mind that this is just about how to launch the build 
jobs. TH's involvement with the way how an actual build is performed is 
limited to a trivial shell script [3]. And by the way, the OpenStack's CI 
which inspired me to do this is still using Jenkins for most of its build 
execution needs.

Anyway, as I'm following these discussions, I think you don't really have 
many reasons to start being afraid that even that part of your work which 
is 100% Jenkins-specific would come to no use. There appears to be a huge 
inertia for sticking with whatever we're using right now Just Because™, 
even if all technical problems are/were solved. This effort started ages 
ago with an "I would like to have multiplatform, early CI for Trojita", and 
now we're discussing "replacing the project management tool within KDE". 
I'm being told requirements such as "it would be cool to have a single tool 
doing everything" or "the CI has to support Mercurial and SVN". Other 
people feel threatened by my work, and yet other people believe that having 
alternatives is a bad thing.

See, I'm happy that I can now finally use tools which make my work on my 
pet project reasonably efficient. I also think that it's cool to offer 
these tools to other people within KDE. Based on face-to-face discussion 
during Akademy, people who were present appeared to be interested, so I did 
the work on this. I fully integrated this stack with the rest of the KDE's 
infrastructure, and make sure that all KDE projects can use these tools if 
they want to. The technical solution is now ready, and it's up to the 
project maintainers to decide whether they want it or not. If they want to 
stick with what they have, more power to them. If they want to wait an 
unspecified amount of time until a general decision is reached, more power 
to them, too. If they want to switch now, more power to them as well.

It would be great to have alignment and for all of us to use the same tool. 
However, I seriously doubt that an alignment can be reached without either 
inducing emotional stress on a subset of our community or compromising 
efficiency of another subset. Is either of that a price that we want to 
pay? I suppose we might know after a long discussion.

The system which is now made available through Gerrit is suitable for 
projects that want to make sure the quality of their code is and remains 
excellent, who consider a failing test a serious problem which must be 
fixed, whose approach to compilers telling them "hey, there's a problem 
with your code" is either "doh, you're right, let's fix it" or "nah, you're 
wrong, let's silence this warning forever, so that we can cut the noise 
down", and who are happy to let a CI system veto their commits if they 
introduce detectable breakage. It might not be a good fit for people who 
*insist* that the only way of working is to 
push-to-master-as-you-go-without-even-a-build-test-because-you-dont-make-mistakes, 
and that no flaky tests can be fixed and are a neccessity of SW 
development. Maybe it will need some reasonably well-contained additions to 
support their use cases. Also, not every project which subscribes to these 
quality values must use Gerrit, of course.

That system is ready to be used now. I promised I won't be pushing this to 
people who don't care, and I intend to keep that promise. If you're a 
project maintainer and you would like your project in Gerrit, you have my 
support. If, however, you would like to revamp the review/CI/... platform 
used throughout KDE as a whole, well, more power to you as well. I'll be 
happy to help, but at this point I cannot imagine myself being a driving 
force behind this. I expect my role to be limited to correcting factual 
mistakes in Gerrit/Zuul descriptions from now on and helping iron out 
possible problems with this setup. I, too, put quite long hours into making 
KDE's CI better and I've so far felt substantial hostility by daring to 
propose an alternative to How Stuff Has Always Been Done.

With kind regards,
Jan

[1] http://ci.openstack.org/
[2] http://ci.openstack.org/zuul/gating.html
[3] 
http://quickgit.kde.org/?p=sysadmin%2Fgerrit-project-config.git&a=blob&f=scripts%2Fbuild-kf5qt5.sh

-- 
Trojitá, a fast Qt IMAP e-mail client -- http://trojita.flaska.net/