Bugreporting barrier is too low with the new Dr. Konqi

Sat Nov 7 16:52:04 GMT 2009

On Sat, Nov 07, 2009 at 04:04:10PM +0100, Boudewijn Rempt wrote:
> On Saturday 07 November 2009, Niko Sams wrote:
> > To improve the current situation Andreas complained about, we need
> > to find duplicates automatically. Is that really that difficult? We
> > shouldn't be the first ones having that problem...
> 
> Well, most software producers control their builds, so there is a
> limited number of builds out in the wild, and they check dups in crash
> dumps per build.
>
the problem of uncontrolled build configurations has more impact on the
actual debugging than comparing backtraces.

to help the former, the trace could embed source code (three lines for
each frame sound reasonable; a list of known cut-off points can be
maintained to keep the traces smaller). getting that information into
the trace is a bit of a challenge, obviously. if it is a build from
source and the unchanged sources are still on disk, it is trivial. for
distributor-provided packages an interface needs to be defined which
allows obtaining the exact sources for any given
package/version/file/line tuple (and, of course, drkonqi needs to be
able to query the packaging system for the exact origin of a given file,
but that's Simple (TM)).

regarding comparing traces, we are starting with two things:
- a trace directly from gdb. parsing that is Trivial (TM).
- a trace embedded in a bug repot (note that there might be multiple
  traces which need to be considered separately). finding that trace is
  Simple (TM) and was obviously already done. parsing that trace is Not
  Hard (TM), either.

at this point we have two traces, frame by frame. comparing the two
would be trivial, if not:
- symbols may be missing for some objects, in which case only the
  module and possibly the function is known, but no exact locations.
- entire frames may be missing due to inlining.

however, even that incomplete information is sufficient to calculate a
similarity score. based on some thresholds, the user can be presented
with "no or unlikely match" (file new report), "likely match" (please
verify if the description indicates a duplicate) and "certain match" (no
way to file a new report is offered at all). when a duplicate is found,
the user's trace is only added to the existing report if it is richer in
detail than all the existing ones.