Towards Excellent Defect Management

Tue Sep 14 16:23:01 BST 2021

For many years now I've been unhappy with both the quality and volume
of crash reports we get for our software. The barrier for crash report
submissions is incredibly high because we've never really had tech to
help elevate "insufficient" reports to become sufficient, outside the
client on which the crash occurred. Out of the very few people that
might want to report a crash even fewer will get beyond the first set
of questions from drkonqi, once they've managed they still have to
fight with their distro for debug symbols and quite possibly lose, and
even if they win there is a good chance the report will either get a
"this isn't very useful. install more symbols" comment or get marked
as dupe. Meanwhile we are spending our days looking at duplicated
crashes, or finding the right blurb to copy paste to ask for a better
trace, or try to find out why software crashes that hasn't actually
crashed for a year because the bug had already been fixed in the
meantime.

We are wasting our users' time. We are wasting our time. This waste
needs to stop.

The good news is that we have all the technical building blocks to fix
it today. In fact, it's even getting better in the future still. All
it takes is a bit of code and a bit of flexibility on our part.

A while ago I started looking into improving the drkonqi experience.
Specifically: submitting crash reports into a purpose built crash
tracking system rather than a bug tracking system. The advantages are
kind of obvious and ranging from server-side de-duplication to
server-side retracing. I've spent many afternoons reading up on and
poking demo instances of every somewhat suitable software I could
find, and Sentry looks like the best option for what we need. It is
practically free software as far as we are concerned, scales
tremendously, has systems for server-side deduplication, server-side
cross-distro/platform retracing (which might also help with some of
the open questions of richer tracing for windows and android), data
scrubbing (what with privacy concerns), client and server-side tags,
can try to figure out when a crash first appeared if supplied with
commit data, can track the quality of specific releases, when a given
crash was fixed, health reports, performance tracking, warning rules,
health report emails, ... I've been playing with it for a month and
still find amazing new things!

One of the best things about Sentry is that it has native support for
debuginfod, enabling us to get debug symbols directly from
distributions, solving the entire cross-distro aspect of crash tracing
in just about the neatest way possible. We get the (incomplete) trace
with lots of metadata, and Senty then uses the metadata to resolve the
symbols through the distros' debuginfod instances to give us a
complete trace.

Even better: with relatively minor adjustments to drkonqi we could use
it right now and get immediate advantage of server-side retracing! I
already have a blob of prototype code for drkonqi that piggybacks
Sentry submission onto the existing code such that we can have both
bugzilla and Sentry.

I am proposing that we roll out a Sentry instance for testing so we
can see if we want to fully embrace it.

You can get a general sense of the features at Sentry's demo instance
https://sentry.io/demo/sandbox/
Here's a code dump for drkonqi
https://invent.kde.org/sitter/drkonqi/-/commits/work/sentry

HS