Getting to 100 % succedding tests (for 2.9), or, simply dropping them all?

Thu Feb 5 06:11:46 GMT 2015

Hi,

currently Calligra (2.9 & master) has 313 tests. Those tests could be used to 
automatically catch regressions (even better that CI runs them on every push, 
so we do not have to to run them ourselves every time) and thus save time 
compared to only users starting to see problems after a release, reporting 
them incorrectly in the issue tracker and then devs taking time to finding the 
real problem and cause.
They also could be useful during the port to Qt5/KF5, as they reassure to a 
good degree things have been moved the right direction.

Just, other than a commit breaking the build, a change resulting in a test 
suddenly failing does not immediately pos a problem for everyone, so it seems 
easy to just ignore that (and fix it tomorrow, well, the other tomorrow, ah, 
next WE perhaps).

-> problem 1: no mechanism to enforce people to fix tests they broke

CURRENT SITUATION

Just... now I have be the bad boy here and point to 
http://build.kde.org/job/calligra_stable/test/?width=800&height=600
There are around ~40 tests failing, i.e. 13 %.

Which means 10 more failing tests then at the begin of 2.9 branching, where it 
was ~30:
http://build.kde.org/job/calligra_master/Variation=All,label=LINBUILDER/1293/

(And the last build for master right now still visible on build.kde,org from 
26.11.2014 had only 26 of 314 failing:
http://build.kde.org/job/calligra_master/Variation=All,label=LINBUILDER/1235/)

Now tests are not coming without a price, everyone waiting on the result of a 
Calligra CI build (or locally) knows how much time they take, and if it is 
only linking.

-> problem 2: running current tests takes a lot of time, too much time locally

PROPOSAL A

Given 10 more failing tests (but no added tests) since the branching of 2.9, 
where actually things should have gotten more stable and correct, we should 
ask ourselves, who is actually looking at those tests. Anyone?

So could we just get rid of them if noone is? :) Would save a very, very big 
amount of cpu cycles and hard disk space for everyone, including CI. And also 
code that would need porting.

PROPOSAL B

You are about to hit your Reply button hard after reading proposal A, because 
you actually prefer tests? Actually I do as well, and those people who spend 
the effort to write, review and maintain all those tests surely also did.

So how could we get back to using the tests as first class utility in our 
Calligra development? With e.g. CI reporting STABLE(=no failing tests) builds 
every time?

For fixing problem 2 we should separate the current tests into unit tests (so 
those simply testing one thing while mocking the rest of the system as much as 
possible), integration tests and other types of tests.
And make sure that unit tests take less then seconds to run, so no one is 
stopped from using them as part of their workflow, e.g. before pushing their 
latest changes to the central repo. "make all test" should be a normal habit.
And leave running all the longer running tests for the CI, he, that's what it 
is for.
-> task T0: specify/document different test types (Calligra wiki/build system)
-> task T1: go through all the tests and mark those tests which can be 
considered quickly runnable unit tests, integration tests, other tests

Even with that test categorization, there is a number of tests failing 
currently that need fixing. Ideally before the 2.9 release and the port. Some 
of them are failing since ages (e.g. diff between
http://build.kde.org/job/calligra_master/Variation=All,label=LINBUILDER/1235/testReport/
http://build.kde.org/job/calligra_stable/1829/testReport/
)
-> task T2: find all the long-time failing tests, disable from build, possibly 
tag as JJ bugs to fix them)
-> task T3: list all the new failing tests and lets fix them by everyone ASAP

For fixing problem 1, this is a social problem. People need to be aware of the 
tests (guess some might not) and value those tests.
No idea if future CI systems deployed could enforce rejection of commits that 
break tests, but ideally people simply feel responsible for breaking tests, 
like they feel responsible for breaking the normal build.

Personally I see tasks T2 and T3 as something to be done first, best before 
2.9 release. See me subscribed to making that happen :)

But first, please your thoughts and feedback on this. 

Cheers
Friedrich