CI congestion/starvation
Albert Astals Cid
aacid at kde.org
Sat Mar 7 17:30:11 GMT 2026
El dissabte, 7 de març del 2026, a les 14:41:51 (Hora estàndard d’Europa
central), Alexander Semke va escriure:
> On 01/03/26 01:31, Ben Cooksley wrote:
> > On Sun, Mar 1, 2026 at 10:46 AM Johnny Jazeix <jazeix at gmail.com> wrote:
> > Le sam. 28 févr. 2026 à 22:24, Ingo Klöcker <kloecker at kde.org> a
> >
> > écrit :
> > > On Samstag, 28. Februar 2026 20:53:38 Mitteleuropäische
> >
> > Normalzeit Johnny
> >
> > > Jazeix wrote:
> > > > Hi,
> > > > today we also have a lot of congestion. After discussion with Ben,
> > > > it's due to a new Gear update which uses the resources of the
> >
> > CI for
> >
> > > > multiple hours.
> > > > Would it be possible to spread the changes done to each repo
> >
> > during a
> >
> > > > full day (with sleeps between each git push) instead of doing
> >
> > them at
> >
> > > > once to let other projects use the CI?
> > >
> > > You do realize that this would mean that the people who do our
> >
> > releases would
> >
> > > have to sit the full day in front of their computer?
> >
> > I don't know the exact process, but I guess all the pushes are not
> > done manually but via a script?
> > How often is there an error requiring human intervention? If it is
> > none, the script can run in background and the person can live its
> > life?
> >
> > Putting sleeps in between each push would make release preparation
> > activities quite difficult, as pushing the version bumps is just one
> > part of the process.
> >
> > > A Gear release happens once a month. I really don't think that's
> >
> > a big
> >
> > > problem. (Yes, there's also Plasma, but I think that's a lot
> >
> > less projects,
> >
> > > and Frameworks.) Just make sure that you don't plan a release of
> >
> > a non-Gear
> >
> > > project around the release date of Gear (or Plasma or
> >
> > Frameworks). Marketing-
> >
> > > wise it's anyway better to avoid such a collision.
> >
> > You don't but other people are impacted. Maybe we can run these heavy
> > process at a "better" time where less developers are active (I guess
> > we can have stats from the CI usage)?
> >
> > It took the CI nodes approximately 10 hours to work through all of the
> > builds for the record (they're just finishing up now, from when they
> > were triggered at 2pm UTC).
> > That includes all the other builds they also received during that time
> > they would normally service.
> >
> > During this time the CI nodes completed a total of 5,211 builds, with
> > the vast majority of these jobs completing either in a matter of
> > seconds (for the JSON/XML/etc validation jobs) or in the space of a
> > few minutes (for conventional CI and CD jobs).
> > 4,807 of those took less than 10 minutes (160 hours of CI time), 346
> > of them took between 10-25 minutes (85 hours of CI time) and 77 of
> > them took more than 25 minutes (55 hours of CI time) for a total of
> > 301 CI hours (difference of 1 hour due to rounding).
> >
> > During this we had conventional Linux CI jobs that completed in under
> > a minute (which includes VM provisioning, cloning sources, unpacking
> > dependencies, configure, build, install, publishing build artifacts,
> > and running tests) as well as jobs for other OSes completing in 2-3
> > minutes.
> >
> > In terms of optimisation, the CI jobs enabled for pim/pim-sieve-editor
> > need to be reviewed, as it is running inappropriate jobs considering
> > the nature of that repository.
> > The results of those runs contributed to 2 hours of wasted CI time.
> >
> > Data for all this is attached.
>
> Today the waiting time on CI is very long again looks like. By looking
> at the attached statistics, I think more things should be reviewed and
> optimized.
As announced in this mailing list, we branched KDE Gear yesterday, this means
triggering jobs for 251 repositories, which are going to take a while to
process.
A bit of patience goes a long way.
CI has a 2 minutes wait time at the moment (except the macos builder, that's a
bit more backlogged).
Albert
>
>
> build_sphinx_app_docs for docs-kdenlive-org failed after 2h (timeout?)
> and is always expensive in general looks like:
>
> https://invent.kde.org/documentation/docs-kdenlive-org/-/jobs?kind=BUILD
>
>
> There are also multiple qt5 builds (especially the expensive and failing
> builds for krita) - do we still need to support Qt5?
>
> > That means it is actually not possible to make it non-disruptive, as
> > doing it at a different time would just be a means of favouring one
> > timezone (say EU) over others - it simply takes a significant amount
> > of time to rebuild the world (which is essentially what a Gear release
> > entails).
>
> If we collect these statistics now for a couple of weeks/months,
> basically the data you attached in the previous email but also with the
> start times, we'll see the distributions across different days and time
> frames and would also be able to calculate the "degree of concurrency on
> CI" - this would allow us to move such peak loads and infrequent
> expensive builds into more idle time frames.
More information about the kde-devel
mailing list