Another proposal for modernization of our infrastructure
jkt at kde.org
Wed Jan 28 11:27:06 GMT 2015
On Wednesday, 28 January 2015 10:08:54 CEST, Ben Cooksley wrote:
> 1) Most applications integrate extremely poorly with LDAP. They
> basically take the details once on first login and don't sync the
> details again after that (this is what both Chiliproject and
> Reviewboard do). How does Gerrit perform here?
Data are fetched from LDAP as needed. There's a local cache for speedup
(with configurable TTL and support for explicit flushes).
> 2) For this trivial scratch repository script, will it offer it's own
> user interface or will developers be required to pass arguments to it
> through some other means? The code you've presented thus far makes it
> appear some other means will be required.
I might not fully understand this question -- I thought we already dicussed
this. The simplest method of invocation can be as easy as `ssh
user at somehost create-personal-project foobar`, with SSH keys verified by
OpenSSH. This is the same UI as our current setup. There are other options,
some of them with fancy, web-based UIs.
> 3) We've used cGit in the past, and found it suffered from performance
> problems with our level of scale. Note that just because a solution
> scales with the size of repositories does not necessarily mean it
> scales with the number of repositories, which is what bites cGit. In
> light of this, what do you propose?
An option which is suggested in the document is to use our current quickgit
setup, i.e. GitPHP. If it works, there's no need to change it, IMHO, and
sticking with it looks like a safe thing to me. But there are many
additional choices (including gitiles).
> 4) Has Gerrit's replication been demonstrated to handle 2000 Git
> repositories which consume 30gb of disk space? How is metadata such as
> repository descriptions (for repository browsers) replicated?
Yes, Gerrit scales far beyond that. See e.g. the thread at
https://groups.google.com/forum/#!topic/repo-discuss/5JHwzednYkc for real
users' feedback about large-scale deployments.
> 5) If Gerrit or it's hosting system were to develop problems how would
> the replication system handle this? From what I see it seems highly
> automated and bugs or glitches within Gerrit could rapidly be
> inflicted upon the anongit nodes with no apparent safe guards (which
> our present custom system has). Bear in mind that failure scenarios
> can always occur in the most unexpected ways, and the integrity of our
> repositories is of paramount importance.
I agree that one needs proper, offline and off-site backups for critical
data, and that any online Git replication is not a proper substitute for
this. The plan for disaster recovery therefore is "restore from backup".
In terms of Gerrit, this means backing up all of the Git repositories and
the dumping the PostgreSQL database, and storing these in a location which
cannot be wiped out or modified by an attacker who has root on the main Git
server, or by a software bug in our Git hosting. One cannot get that with
just Git replication, of course.
What are the safeguard mechanisms that you mentioned? What threats do they
mitigate? I'm asking because e.g. the need for frequent branch deletion is
minimized by Gerrit's code review process which uses "branches" inside.
What risks do you expect to see here?
> 6) Notifications: Does it support running various checks that our
> hooks do at the moment for license validity and the like? When these
> rules are tripped the author is emailed back on their own commits.
Yes, the proposed setup supports these. The best place for implementing
them is via CI invocation through the ref-updated hook. My personal
preference would be a ref-updated event handler in Zuul to ensure proper
scalability, but there are other options.
> 7) Storing information such as tree metadata location within
> individual Git repositories is a recipe for delivering a system that
> will eventually fail to scale, and will abuse resources. Due to the
> time it takes to fork out to Git,
Gerrit uses JGit, a Java implementation of Git. There are no forks.
> plus the disk access necessary for
> it to retrieve the information in question, I suspect your generation
> script will take several load intensive minutes to complete even if it
> only covers mainline repositories. This is comparable to the
> performance of Chiliproject in terms of generation at the moment.
The yesterday-released Gerrit 2.10 adds a REST API for fetching arbitrary
data from files stored in Git with aggressive caching. I would like to use
that for generating that kde_projects.xml file.
> The original generation of our Git hooks invoked Git several times per
> commit, which meant the amount of time taken to process 1000 commits
> easily reached 10 minutes. I rebuilt them to invoke git only a handful
> of times per push - which is what we have now.
Gerrit has a different architecture with no forks and aggressive caching.
I'm all for benchmarking, though. Do you want a test repository to run your
> 8) Shifting information such as branch assignments in the same manner
> will necessitate that someone have access to a copy of the Git
> repository to determine the branch to use. This is something the CI
> system cannot ensure, as it needs to determine this information for
> dependencies, and a given node may not have a workspace for the
> repository in question. It also makes it difficult to update rules
> which are common among a set of repositories such as those for
> Frameworks and Plasma (Workspace). I've no idea if it would cause
> problems for kdesrc-build, but that is also a possibility.
The kde_projects.xml which stores a copy of these data will remain
unchanged, and it should also remain to be the place consulted by e.g. the
CI scripts, or the kdesrc-build. These tools will need no change.
What the proposal says is to base generating of that file on data in Git
rather than on a custom webapp.
> 9) You've essentially said you are going to eliminate our existing
The proposal said that it might be possible to replace a large part of the
functionality with Gerrit's native features with zero maintenance. If the
remaining functionality (CRLF line endings and author human names for
direct pushes) is important to warrant an ongoing maintenance of the custom
hooks, they can be run without a problem.
> Does Gerrit support:
> a) line ending checks, with exceptions for certain file types and
The proposal says to handle this by the CI setup. This means that it was
proposed to enable pushing CRLF data to our repos, with a followup e-mail
saying "hey, you're doing a bad thing". That's a trade off for not having
to maintain these scripts.
Alternative options for this include:
- preserving this part of the hooks and running them from Gerrit,
- extending an existing Git validation plugin to do this.
> b) Convenient deactivation of these checks if necessary.
Yes, this is configurable.
> c) Checks on the author name to ensure it bears a resemblence to
> something actual.
No, the author's name is not checked at the moment. If we decide to change
this, it's going to be a couple-line patch, or a custom hook.
However, I do not think that checking names in the way the hooks to it now
is actually a good thing. Please read http://wookware.org/name.html for an
example of a real person from the UK who cannot commit to KDE.
The potential for mistakes is largely mitigated by checks for e-mail
validity. In order for this to be a problem, one would have to push a
commit with a valid e-mail address, but wrong name ("jkt <jkt at kde.org>").
We should evaluate whether risking this is worth the reduced maintenance.
Also, this only affects direct pushes and KDE developers. Patch proposals
from third parties can be easily and immediately downvoted by the CI, with
a helpful message on what to fix.
> d) Prohibiting anyone from pushing certain types of email address
> such as *@localhost.localdomain?
Similar check applies to e-mail validation. An ACL verifies whether an
e-mail matches one of user’s registered address. These addresses are either
read from LDAP, or validated by a mail probe to make sure that they
actually exist and belong to the user in question. This validation can be
configured on an LDAP group basis, so it is possible to allow KDE
developers to push commits on behalf of third-party contributors while
preventing regular users from faking their identity.
> 10) You were aware of the DSL work Scarlett is doing and the fact this
> is Jenkins specific (as it generates Jenkins configuration). How can
> this work remain relevant?
> Additionally, Scarlett's work will introduce declarative configuration
> for jobs to KDE.
My understanding of Scarlett's work is that it aims at cleaning up our
current configuration, making it work on Windows and OS X, and to introduce
a declarative language for preparing job descriptions. AFAIK, the only part
which might be Jenkins-specific is the last bit, and I fully expect a
declarative generator being able to generate job descriptions for another
system just by adding a proper output format. Moving to declarative
approach is the big change here; adding another output is much less work.
> 11) We actually do use some of Jenkins advanced features, and it
> offers quite a lot more than just a visual view of the last failure.
> As a quick overview:
> a) Tracked history for tests (you can determine if a single test
> is flaky and view a graph of it's pass/fail history).
Please see section 3.3.2 which discusses possible ways on how to deal with
flaky tests. IMHO, the key feature and our ultimate goal is "let's handle
flaky tests efficiently", not "let's have a graph of failing tests" (how
would that work with a non-linear history of pre-merge CI?).
> b) Log parsing to track the history of compiler warnings and other
> matters of significance (this is fully configurable based on regexes)
That's in section 3.3.3. One solution for using this is making the build
warning-free on one well-known platform, and enforcing -Werror in there.
> c) Integrated cppcheck and code coverage reports, actively used by
> some projects within KDE.
The Zuul-based CI setup launches KDE's existing build scripts and delivers
their output. I choose to disable cppcheck for simplicity and because no
projects that are currently in Gerrit are covered by Jenkins' cppcheck on
build.kde.org at this time. There is no reason for not enabling cppcheck
runs again, of course. When I looked at it last time, it however seemd that
the include paths were not being passed properly and the data I got back
were clearly bogous, so I decided to skip that for now. The same applies to
coverage reports. Both will be provided, of course.
> d) Intelligent dashboards which allow you to get an overview of a
> number of jobs easily.
> Bear in mind that these intelligent dashboards can be setup by anyone
> and are able to filter on a number of conditions. They can also
> provide RSS feeds and update automatically when a build completes.
> How would Zuul offer any of this? And how custom would this all have
> to be? Custom == maintenance cost.
The report explicitly acknowledges a need of future work for this status
matrix, and proposes how to get there (section 3.3.4).
Regarding the maintenance costs, let's wait for when it is ready and
evaluate the maintenance burden at that point.
> Addendum: the variations, etc. offered by the Zuul instance which
> already exists in the Gerrit clone are made possible by the hardware
> resources Jan has made available to that system. Jenkins is fully
> capable of offering such builds as well with the appropriate setup,
> some of which are already used - see the Multi Configuration jobs such
> as the ones used by Trojita and Plasma Framework.
I believe that it is not about HW resources, but about services'
configuration. Does KDE's Jenkins as-is support building against a
systemwide version of Qt, for example?
> You've lost me i'm afraid with the third party integration - please
> clarify what you're intending here.
I am pointing out that it is easy to plug a third-party testing system to
Gerrit/Zuul pretty easily, mainly due to the open APIs and the system's
architecture. If e.g. one of the FreeBSD guys wanted to help, they would
have a way of getting involved without an explicit action from sysadmins.
To me, that lowers the bar of entry a bit, and it also frees up some
sysadmin time for more important tasks, so I think that it's a benefit of
such a setup.
> 12) The tone of the way the event stream feature is mentioned makes it
> sound like sysadmin actively prevents people from receiving the
> information they need. We have never in the past prevented people from
> receiving notifications they've requested - you yourself have one that
> triggers builds on the OBS for Trojita.
It was never my intention to imply anything like that; sorry for this. That
section says that it requires manual effort from sysadmins and custom code.
In contrast to that, the proposed setup enables anyone to listen for events
in a machine-readable way without any prior effort from sysadmins to enable
> 13) You've used the terminology "we" throughout your document. Who are
> the other author(s)?
I think this is similar to the previous report. I received feedback about
this paper from several developers. Due to the rather heated nature of the
previous rounds of the discussion and some personal attacks, they preferred
to not be credited as authors. The actual wording is mine, I wrote the
text, so I'm listed as the only author.
Anyway, I hope that we'll be able to judge the merits of the individual
proposals, and that this won't deteriorate into a popularity contest.
Trojitá, a fast Qt IMAP e-mail client -- http://trojita.flaska.net/
More information about the kde-core-devel