CI Utilisation and system efficiency
Ben Cooksley
bcooksley at kde.org
Tue Apr 22 09:57:14 BST 2025
On Tue, Apr 22, 2025 at 9:53 AM Maciej Jesionowski <yavnrh at gmail.com> wrote:
> Hi,
>
Hi Maciej,
> Are these servers running multiple builds at a time, or is a Windows build
> given the full host resources, i.e. 8c/16t and 64GB of RAM? If you can
> monitor the resources in real time, it would be interesting to confirm if
> indeed the CPU utilization is significantly lower than 100%, meaning
> something else is bottlenecking it.
>
There is a reasonable probability that the systems are processing multiple
builds at any given moment.
During the normal working hours in Europe that is almost guaranteed to be
the case.
I am almost 100% certain that the Krita build times on Windows are being
adversely affected by Docker on Windows inefficiencies, with speedups
likely to be significant for Krita if what I saw with Craft builds was
anything to go by.
(Craft builds would essentially be unworkable without the punch through we
currently provide)
>
> I'm not sure what's the expected build time of a native (i.e. no
> docker/VM/etc.) build on these servers, but for reference, I'm seeing a bit
> less than 13 minutes on a stock Ryzen 9 9950X (a developer build, including
> test apps). I can see this number go way up with less cores, but still,
> over 100 minutes is very long.
>
Is that a clean build or an incremental build?
These servers are https://www.hetzner.com/dedicated-rootserver/ax52/ for
the record.
> Thanks,
> Maciej.
>
Cheers,
Ben
>
> On Mon, Apr 21, 2025 at 9:15 PM Ben Cooksley <bcooksley at kde.org> wrote:
>
>> On Tue, Apr 22, 2025 at 5:57 AM Dmitry Kazakov <dimula73 at gmail.com>
>> wrote:
>>
>>> Hi, Ben!
>>>
>>
>> Hey Dmitry,
>>
>>
>>>
>>> As for Krita, most of CI time is spent on the Windows pipeline, which
>>> build extremely slowly due to done obscure filesystem issues (searching
>>> includes is extremely slow). I personally don't know how to fix it. I
>>> tried: 1) PCH builds, 2) relative includes, 3) split debug info (dwo). The
>>> only solution left is to rewrite a huge portion of Krita to reduce amount
>>> of includes. Which is, obviously, not an option atm.
>>>
>>
>> This is probably at least in part due to Windows on Docker having
>> extremely poor file system performance even vs. straight NTFS (which isn't
>> great to begin with).
>> That will be fixed by VM based CI (progress update - I have most of the
>> tool that will manage the underlying base images written now, just need to
>> finish the VM provisioning part and give it some serious testing)
>>
>>
>>>
>>> Another point that requires extra build time for Krita is an
>>> inappropriate timeout on 100 minutes. A lot of our windows builds are
>>> terminated at around 95% completion because of this timeout, so we have to
>>> rerun them and, effectively, consume more and more CI time.
>>>
>>
>> Have you got a list of these so I can have a look to see if the timeout
>> is set too low?
>> Increasing the timeout is only a temporary fix though - we will need to
>> find a solution to why the build time is taking so long.
>>
>>
>>>
>>> ---
>>> Dmitry Kazakov
>>>
>>
>> Cheers,
>> Ben
>>
>>
>>>
>>> пт, 18 апр. 2025 г., 21:27 Ben Cooksley <bcooksley at kde.org>:
>>>
>>>> Hi all,
>>>>
>>>> Over the past week or two there have been a number of complaints
>>>> regarding CI builder availability which i've done some investigating into
>>>> this morning.
>>>>
>>>> Part of this is related to the Windows CI builders falling offline due
>>>> to OOM events, however the rest is simply due to a lack of builder time
>>>> availability (which is what this email is focused on).
>>>>
>>>> Given we have 6 Hetzner AX51 servers connected to Gitlab (each equipped
>>>> with a Ryzen 7 7700 CPU, 64GB RAM and NVMe storage) the issue is not
>>>> available build power - it is the number of builds and the length of those
>>>> builds that is at issue.
>>>>
>>>> This morning I ran a basic query to ascertain the top 20 projects for
>>>> CI time utilisation on invent.kde.org which revealed the following:
>>>>
>>>> full_path | time_used | job_count
>>>> ------------------------------+------------------+-----------
>>>> plasma/kwin | 320:47:04.966412 | 2387
>>>> graphics/krita | 178:03:19.080763 | 423
>>>> multimedia/kdenlive | 174:08:09.876842 | 697
>>>> network/ruqola | 173:17:47.311305 | 555
>>>> plasma/plasma-workspace | 155:10:03.618929 | 660
>>>> network/neochat | 138:03:23.926652 | 1546
>>>> education/kstars | 129:49:17.74229 | 329
>>>> sysadmin/ci-management | 111:21:09.739792 | 154
>>>> plasma/plasma-desktop | 108:56:52.849433 | 776
>>>> kde-linux/kde-linux-packages | 81:00:10.001937 | 33
>>>> kdevelop/kdevelop | 59:40:51.54474 | 217
>>>> office/kmymoney | 54:32:00.24623 | 271
>>>> frameworks/kio | 53:54:19.046685 | 690
>>>> education/labplot | 52:36:30.343671 | 245
>>>> murveit/kstars | 52:32:56.882728 | 128
>>>> frameworks/kirigami | 47:07:19.172935 | 1627
>>>> system/dolphin | 46:09:58.02836 | 705
>>>> kde-linux/kde-linux | 39:25:54.052469 | 46
>>>> utilities/kate | 36:09:22.18958 | 356
>>>> wreissenberger/kstars | 35:58:14.120515 | 105
>>>>
>>>> If we look closely, KStars has three spots on this list (totalling 216
>>>> hours of time used, making it the biggest app user of CI time).
>>>>
>>>> Projects on the above list are asked to please review their jobs and
>>>> how they are conducting development to ensure CI time is used efficiently
>>>> and appropriately.
>>>>
>>>> Other projects should also please review their usage and optimise
>>>> accordingly even if they're not on this list as there is efficiencies to be
>>>> found in all projects.
>>>>
>>>> When reviewing the list of CI builds projects have enabled, it is
>>>> important to consider to what degree your project benefits from having
>>>> various builds enabled. One common pattern i've seen is having Alpine, SUSE
>>>> Qt 6.9 and SUSE Qt 6.10 all enabled.
>>>>
>>>> If you need to verify building on Alpine / MUSL type systems and wish
>>>> to monitor for Qt Next regressions then you probably shouldn't have a
>>>> conventional Linux Qt stable build as those two jobs between them already
>>>> cover that list of permutations.
>>>>
>>>> I've taken a quick look at some of these and can suggest the following:
>>>>
>>>> KWin: it has two conventional Linux jobs (suse_qt69 and suse_qt610)
>>>> plus a custom reduced feature set job. It seems like one of these
>>>> conventional Linux jobs should be dropped.
>>>>
>>>> KStars: Appears to have a custom Linux job in addition to a
>>>> conventional Linux job. Choose one please.
>>>>
>>>> Ruqola: Appears to be conducting a development process whereby changes
>>>> are made in stable then immediately merged to master in a ever continuing
>>>> loop. Please discontinue this behaviour and only periodically merge stable
>>>> to master.
>>>>
>>>> Also needs to drop one of it's Linux jobs as they're duplicating
>>>> functionality as noted above.
>>>>
>>>> Plasma Workspace/Desktop: At least in part this seems to be driven by
>>>> Appium tests. Please reduce the number of these and/or streamline the
>>>> process for running an Appium test. Consideration should be given to
>>>> enabling the CI option use-ccache as well.
>>>>
>>>> KDevelop: Please enable the CI option use-ccache.
>>>>
>>>> Labplot: Appears to have a strange customisation in place to the
>>>> standard jobs which shouldn't be necessary as flags in .kde-ci.yml should
>>>> permit that to be done.
>>>>
>>>> Thanks,
>>>> Ben
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-devel/attachments/20250422/6d47c36e/attachment-0001.htm>
More information about the kde-devel
mailing list