CI Utilisation and system efficiency
Maciej Jesionowski
yavnrh at gmail.com
Mon Apr 21 22:52:57 BST 2025
Hi,
Are these servers running multiple builds at a time, or is a Windows build
given the full host resources, i.e. 8c/16t and 64GB of RAM? If you can
monitor the resources in real time, it would be interesting to confirm if
indeed the CPU utilization is significantly lower than 100%, meaning
something else is bottlenecking it.
I'm not sure what's the expected build time of a native (i.e. no
docker/VM/etc.) build on these servers, but for reference, I'm seeing a bit
less than 13 minutes on a stock Ryzen 9 9950X (a developer build, including
test apps). I can see this number go way up with less cores, but still,
over 100 minutes is very long.
Thanks,
Maciej.
On Mon, Apr 21, 2025 at 9:15 PM Ben Cooksley <bcooksley at kde.org> wrote:
> On Tue, Apr 22, 2025 at 5:57 AM Dmitry Kazakov <dimula73 at gmail.com> wrote:
>
>> Hi, Ben!
>>
>
> Hey Dmitry,
>
>
>>
>> As for Krita, most of CI time is spent on the Windows pipeline, which
>> build extremely slowly due to done obscure filesystem issues (searching
>> includes is extremely slow). I personally don't know how to fix it. I
>> tried: 1) PCH builds, 2) relative includes, 3) split debug info (dwo). The
>> only solution left is to rewrite a huge portion of Krita to reduce amount
>> of includes. Which is, obviously, not an option atm.
>>
>
> This is probably at least in part due to Windows on Docker having
> extremely poor file system performance even vs. straight NTFS (which isn't
> great to begin with).
> That will be fixed by VM based CI (progress update - I have most of the
> tool that will manage the underlying base images written now, just need to
> finish the VM provisioning part and give it some serious testing)
>
>
>>
>> Another point that requires extra build time for Krita is an
>> inappropriate timeout on 100 minutes. A lot of our windows builds are
>> terminated at around 95% completion because of this timeout, so we have to
>> rerun them and, effectively, consume more and more CI time.
>>
>
> Have you got a list of these so I can have a look to see if the timeout is
> set too low?
> Increasing the timeout is only a temporary fix though - we will need to
> find a solution to why the build time is taking so long.
>
>
>>
>> ---
>> Dmitry Kazakov
>>
>
> Cheers,
> Ben
>
>
>>
>> пт, 18 апр. 2025 г., 21:27 Ben Cooksley <bcooksley at kde.org>:
>>
>>> Hi all,
>>>
>>> Over the past week or two there have been a number of complaints
>>> regarding CI builder availability which i've done some investigating into
>>> this morning.
>>>
>>> Part of this is related to the Windows CI builders falling offline due
>>> to OOM events, however the rest is simply due to a lack of builder time
>>> availability (which is what this email is focused on).
>>>
>>> Given we have 6 Hetzner AX51 servers connected to Gitlab (each equipped
>>> with a Ryzen 7 7700 CPU, 64GB RAM and NVMe storage) the issue is not
>>> available build power - it is the number of builds and the length of those
>>> builds that is at issue.
>>>
>>> This morning I ran a basic query to ascertain the top 20 projects for CI
>>> time utilisation on invent.kde.org which revealed the following:
>>>
>>> full_path | time_used | job_count
>>> ------------------------------+------------------+-----------
>>> plasma/kwin | 320:47:04.966412 | 2387
>>> graphics/krita | 178:03:19.080763 | 423
>>> multimedia/kdenlive | 174:08:09.876842 | 697
>>> network/ruqola | 173:17:47.311305 | 555
>>> plasma/plasma-workspace | 155:10:03.618929 | 660
>>> network/neochat | 138:03:23.926652 | 1546
>>> education/kstars | 129:49:17.74229 | 329
>>> sysadmin/ci-management | 111:21:09.739792 | 154
>>> plasma/plasma-desktop | 108:56:52.849433 | 776
>>> kde-linux/kde-linux-packages | 81:00:10.001937 | 33
>>> kdevelop/kdevelop | 59:40:51.54474 | 217
>>> office/kmymoney | 54:32:00.24623 | 271
>>> frameworks/kio | 53:54:19.046685 | 690
>>> education/labplot | 52:36:30.343671 | 245
>>> murveit/kstars | 52:32:56.882728 | 128
>>> frameworks/kirigami | 47:07:19.172935 | 1627
>>> system/dolphin | 46:09:58.02836 | 705
>>> kde-linux/kde-linux | 39:25:54.052469 | 46
>>> utilities/kate | 36:09:22.18958 | 356
>>> wreissenberger/kstars | 35:58:14.120515 | 105
>>>
>>> If we look closely, KStars has three spots on this list (totalling 216
>>> hours of time used, making it the biggest app user of CI time).
>>>
>>> Projects on the above list are asked to please review their jobs and how
>>> they are conducting development to ensure CI time is used efficiently and
>>> appropriately.
>>>
>>> Other projects should also please review their usage and optimise
>>> accordingly even if they're not on this list as there is efficiencies to be
>>> found in all projects.
>>>
>>> When reviewing the list of CI builds projects have enabled, it is
>>> important to consider to what degree your project benefits from having
>>> various builds enabled. One common pattern i've seen is having Alpine, SUSE
>>> Qt 6.9 and SUSE Qt 6.10 all enabled.
>>>
>>> If you need to verify building on Alpine / MUSL type systems and wish to
>>> monitor for Qt Next regressions then you probably shouldn't have a
>>> conventional Linux Qt stable build as those two jobs between them already
>>> cover that list of permutations.
>>>
>>> I've taken a quick look at some of these and can suggest the following:
>>>
>>> KWin: it has two conventional Linux jobs (suse_qt69 and suse_qt610) plus
>>> a custom reduced feature set job. It seems like one of these conventional
>>> Linux jobs should be dropped.
>>>
>>> KStars: Appears to have a custom Linux job in addition to a conventional
>>> Linux job. Choose one please.
>>>
>>> Ruqola: Appears to be conducting a development process whereby changes
>>> are made in stable then immediately merged to master in a ever continuing
>>> loop. Please discontinue this behaviour and only periodically merge stable
>>> to master.
>>>
>>> Also needs to drop one of it's Linux jobs as they're duplicating
>>> functionality as noted above.
>>>
>>> Plasma Workspace/Desktop: At least in part this seems to be driven by
>>> Appium tests. Please reduce the number of these and/or streamline the
>>> process for running an Appium test. Consideration should be given to
>>> enabling the CI option use-ccache as well.
>>>
>>> KDevelop: Please enable the CI option use-ccache.
>>>
>>> Labplot: Appears to have a strange customisation in place to the
>>> standard jobs which shouldn't be necessary as flags in .kde-ci.yml should
>>> permit that to be done.
>>>
>>> Thanks,
>>> Ben
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kimageshop/attachments/20250421/ba475f68/attachment.htm>
More information about the kimageshop
mailing list