Gitlab CI: failed unit tests vs. currently passing CI

Sun Jan 23 17:24:29 GMT 2022

On Mon, Jan 24, 2022 at 12:56 AM Albert Astals Cid <aacid at kde.org> wrote:

> El diumenge, 23 de gener de 2022, a les 1:59:01 (CET), Ben Cooksley va
> escriure:
> > On Sun, Jan 23, 2022 at 12:29 PM Albert Astals Cid <aacid at kde.org>
> wrote:
> >
> > > El diumenge, 23 de gener de 2022, a les 0:09:23 (CET), Ben Cooksley va
> > > escriure:
> > > > On Sun, Jan 23, 2022 at 11:29 AM Albert Astals Cid <aacid at kde.org>
> > > wrote:
> > > >
> > > > > El dissabte, 22 de gener de 2022, a les 6:11:29 (CET), Ben
> Cooksley va
> > > > > escriure:
> > > > > > EXCLUDE_DEPRECATED_BEFORE_AND_ATOn Sat, Jan 22, 2022 at 1:31 PM
> > > Friedrich
> > > > > > W. H. Kossebau <kossebau at kde.org> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > >
> > > > > >
> > > > > > > seems that Gitlab CI is currently configured to show the green
> > > > > "Success"
> > > > > > > checkmark for pipeline runs even if unit tests are failing.
> > > > > > >
> > > > > >
> > > > > > That is correct, only compilation or other internal failures
> cause
> > > the
> > > > > > build to show a failure result.
> > > > > >
> > > > > >
> > > > > > > Reasons seems to be that there Gitlab only knows Yay or Nay,
> > > without
> > > > > the
> > > > > > > warning state level known from Jenkins.
> > > > > > >
> > > > > >
> > > > > > Also correct.
> > > > > >
> > > > > >
> > > > > > > And given that quite some projects (sadly) maintain a few
> long-time
> > > > > > > failing
> > > > > > > unit tests, having the pipeline fail on unit tests seems to
> have
> > > been
> > > > > seen
> > > > > > > as
> > > > > > > too aggressive
> > > > > >
> > > > > >
> > > > > > Correct again.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This of course harms the purpose of the unit tests, when their
> > > failures
> > > > > > > are
> > > > > > > only noticed weeks later, not e.g. at MR discussion time.
> > > > > > >
> > > > > >
> > > > > > Gitlab does note changes in the test suite as can currently be
> seen
> > > on
> > > > > > https://invent.kde.org/frameworks/kio/-/merge_requests/708
> > > > > > Quoting the page:  "Test summary contained 33 failed and 16 fixed
> > > test
> > > > > > results out of 205 total tests"
> > > > > >
> > > > > > It does the same thing for Code Quality - "Code quality scanning
> > > detected
> > > > > > 51 changes in merged results"
> > > > >
> > > > > Don't want to derail the confirmation, but those results are
> terrible,
> > > > > they always say things changed in places not touched by the code of
> > > the MR,
> > > > > any idea why?
> > > > >
> > > >
> > > > Unfortunately not - my only guess would be that cppcheck reports
> results
> > > > slightly differently which Gitlab has issues interpreting.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Seeing how at least in KDE Frameworks first regressions
> sneaked in
> > > > > without
> > > > > > > someone noticing (nobody looks at logs when the surface shows a
> > > green
> > > > > > > checkmark, e.g. kcoreaddons, kwidgetsaddons, kio, purpose,
> krunner
> > > on
> > > > > > > openSUSE
> > > > > > > and possibly more have regressed in recent weeks, see
> > > build.kde.org)
> > > > > this
> > > > > > > should be something to deal with better, right?
> > > > > >
> > > > > >
> > > > > > > Bhushan gave two first ideas just now on #kde-sysadmin:
> > > > > > > > Well we can add a switch that repos can commit to saying test
> > > > > failure is
> > > > > > > build failure
> > > > > > > > Another alternative is we use bot to write a comment on MR
> > > > > > >
> > > > > > > IMHO, to give unit tests the purpose they have, we should by
> > > default to
> > > > > > > let
> > > > > > > test failures be build failures. And have projects opt out if
> they
> > > > > need to
> > > > > > > have some unit tests keep failing, instead of e.g. tagging them
> > > with
> > > > > > > expected
> > > > > > > failures or handling any special environment they run into on
> the
> > > CI.
> > > > > > >
> > > > > > > Your opinions?
> > > > > > >
> > > > > >
> > > > > > The switch will need to be around the other way i'm afraid as
> there
> > > are
> > > > > > simply too many projects with broken tests right now.
> > > > > > The best place for that switch will be in .kde-ci.yml.
> > > > > >
> > > > > > My only concern however would be abuse of this switch, much in
> the
> > > way
> > > > > that
> > > > > > certain projects abuse EXCLUDE_DEPRECATED_BEFORE_AND_AT.
> > > > > > The last thing we would want would be for people to flip this
> switch
> > > and
> > > > > > then leave their CI builds in a failing state - meaning that
> actual
> > > > > > compilation failures would be missed (and then lead to CI
> maintenance
> > > > > > issues)
> > > > > >
> > > > > > Thoughts on that?
> > > > >
> > > > > Test failing should mark the CI as failed, anything other than that
> > > > > doesn't make sense. The CI did fail marking it as passed is lying
> to
> > > > > ourselves.
> > > >
> > > >
> > > > > We can *still* merge failed MR with failed CI, the Merge button is
> just
> > > > > red, but it will work.
> > > > >
> > > >
> > > > There is a big difference between "this doesn't compile" (because
> someone
> > > > forgot to commit a header/etc, dependency change that isn't in place
> or
> > > > because of a platform specific issue) and "some tests failed".
> > > > What that encourages is for people to ignore the results from the CI
> > > system
> > > > - as they'll get used to ignoring the CI system saying something is
> > > failing.
> > >
> > > I disagree, "this doesn't compile" is usually super easy to fix once
> > > spotted, a failing test is usually more nuanced and catching it when it
> > > starts happening is paramount.
> > >
> >
> > Except when it comes to non-Linux platforms where in some cases
> developers
> > are hostile to fixing issues - viewing those platforms as someone else's
> > problem.
> > One of the benefits we were expecting to get out of Gitlab CI was that
> > build failures on platforms like Windows wouldn't end up being merged and
> > would be fixed before it lands.
> >
> > You are now wanting to remove that.
>
> I can see where you're coming from, but just let it be clear that this is
> *not* what I want, what I want is more strictness "green means actually
> working (i.e. test pass), not just compiling" not less.
>

I'm happy to provide that strictness - only on the basis that the project
is ready for that level of strictness.
Otherwise developers will just ignore failing CI because they'll think it
is just the tests failing and won't look more closely.

If a project has broken tests on any platform then it is not ready for more
strict CI and should continue to only have compilation taken into account
as to whether a build fails or not.
The information on failing/passing tests is still presented within the
Gitlab UI.

A number of our projects are not ready for more strict CI - with Windows
being a key stumbling block for a good proportion of Frameworks.

>
> Cheers,
>   Albert
>

Cheers,
Ben

>
> >
> > In that case I guess we should remove everything but Linux CI - because
> the
> > cost of maintaining it will be too high.
> >
> >
> > >
> > > >
> > > > While this is not such a big deal for Linux, it is a massive deal
> for the
> > > > smaller platforms that far less people run.
> > > >
> > > > Saying you can merge when the CI says it is failing is setting
> ourselves
> > > up
> > > > for failure.
> > >
> > > I don't see how hiding (because of the super low visibility they have)
> > > that the tests are failing, or even worse have started failing (status
> quo
> > > and as far as I understand your suggestion) is any better.
> > >
> >
> > If a project has all passing tests and has a commitment to ensure they
> > remain passing, then I have no problem in principle with that project
> being
> > allowed to make failing tests mean a failing CI run.
> > Code should never be merged when the CI run fails though.
> >
> > We should never normalise merging when the CI reports a failure - it
> makes
> > pre-merge CI pointless.
> >
> >
> > >
> > > Cheers,
> > >   Albert
> > >
> >
> > Thanks,
> > Ben
> >
> >
> > >
> > > >
> > > >
> > > > > Maybe this red button will convince people to fix their tests.
> (one can
> > > > > hope, right?)
> > > > >
> > > > > Of course if we do this change it should happen after we've done
> that
> > > > > change that fixes the test failing because of however gitlab CI is
> > > set-up
> > > > > (you mentioned we had to wait for Jenkins to be disabled for that)
> > > > >
> > > > > Cheers,
> > > > >   Albert
> > > > >
> > > >
> > > > Regards,
> > > > Ben
> > > >
> > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Cheers
> > > > > > > Friedrich
> > > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Ben
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20220124/b4f7f946/attachment-0001.htm>