CI congestion/starvation
Alexander Semke
alexander.semke at web.de
Sat Mar 7 13:41:51 GMT 2026
On 01/03/26 01:31, Ben Cooksley wrote:
> On Sun, Mar 1, 2026 at 10:46 AM Johnny Jazeix <jazeix at gmail.com> wrote:
>
> Le sam. 28 févr. 2026 à 22:24, Ingo Klöcker <kloecker at kde.org> a
> écrit :
> >
> > On Samstag, 28. Februar 2026 20:53:38 Mitteleuropäische
> Normalzeit Johnny
> > Jazeix wrote:
> > > Hi,
> > > today we also have a lot of congestion. After discussion with Ben,
> > > it's due to a new Gear update which uses the resources of the
> CI for
> > > multiple hours.
> > > Would it be possible to spread the changes done to each repo
> during a
> > > full day (with sleeps between each git push) instead of doing
> them at
> > > once to let other projects use the CI?
> >
> > You do realize that this would mean that the people who do our
> releases would
> > have to sit the full day in front of their computer?
> >
>
> I don't know the exact process, but I guess all the pushes are not
> done manually but via a script?
> How often is there an error requiring human intervention? If it is
> none, the script can run in background and the person can live its
> life?
>
>
> Putting sleeps in between each push would make release preparation
> activities quite difficult, as pushing the version bumps is just one
> part of the process.
>
>
> > A Gear release happens once a month. I really don't think that's
> a big
> > problem. (Yes, there's also Plasma, but I think that's a lot
> less projects,
> > and Frameworks.) Just make sure that you don't plan a release of
> a non-Gear
> > project around the release date of Gear (or Plasma or
> Frameworks). Marketing-
> > wise it's anyway better to avoid such a collision.
> >
>
> You don't but other people are impacted. Maybe we can run these heavy
> process at a "better" time where less developers are active (I guess
> we can have stats from the CI usage)?
>
>
> It took the CI nodes approximately 10 hours to work through all of the
> builds for the record (they're just finishing up now, from when they
> were triggered at 2pm UTC).
> That includes all the other builds they also received during that time
> they would normally service.
>
> During this time the CI nodes completed a total of 5,211 builds, with
> the vast majority of these jobs completing either in a matter of
> seconds (for the JSON/XML/etc validation jobs) or in the space of a
> few minutes (for conventional CI and CD jobs).
> 4,807 of those took less than 10 minutes (160 hours of CI time), 346
> of them took between 10-25 minutes (85 hours of CI time) and 77 of
> them took more than 25 minutes (55 hours of CI time) for a total of
> 301 CI hours (difference of 1 hour due to rounding).
>
> During this we had conventional Linux CI jobs that completed in under
> a minute (which includes VM provisioning, cloning sources, unpacking
> dependencies, configure, build, install, publishing build artifacts,
> and running tests) as well as jobs for other OSes completing in 2-3
> minutes.
>
> In terms of optimisation, the CI jobs enabled for pim/pim-sieve-editor
> need to be reviewed, as it is running inappropriate jobs considering
> the nature of that repository.
> The results of those runs contributed to 2 hours of wasted CI time.
>
> Data for all this is attached.
Today the waiting time on CI is very long again looks like. By looking
at the attached statistics, I think more things should be reviewed and
optimized.
build_sphinx_app_docs for docs-kdenlive-org failed after 2h (timeout?)
and is always expensive in general looks like:
https://invent.kde.org/documentation/docs-kdenlive-org/-/jobs?kind=BUILD
There are also multiple qt5 builds (especially the expensive and failing
builds for krita) - do we still need to support Qt5?
>
> That means it is actually not possible to make it non-disruptive, as
> doing it at a different time would just be a means of favouring one
> timezone (say EU) over others - it simply takes a significant amount
> of time to rebuild the world (which is essentially what a Gear release
> entails).
If we collect these statistics now for a couple of weeks/months,
basically the data you attached in the previous email but also with the
start times, we'll see the distributions across different days and time
frames and would also be able to calculate the "degree of concurrency on
CI" - this would allow us to move such peak loads and infrequent
expensive builds into more idle time frames.
--
Alexander
More information about the kde-devel
mailing list