abrt + X Error => zillions of duplicate bug reports?

Wed Nov 25 09:20:09 UTC 2009

Hi Adam,

please see below.

On 11/24/2009 08:15 PM, Adam Williamson wrote:
> On Sun, 2009-11-22 at 19:21 +0100, Martin Sourada wrote:
>> So,
>>
>> since I've already received 3 separate bug reports caused by BadIDChoice
>> X Error in subtitleeditor [1][2][3] (haven't had enough time to debug
>> and try to fix it yet though) by abrt, I wonder if there is any room for
>> duplicity detection improvement in these cases, or if we are doomed to
>> zillions of duplicates in rhbz? (btw. otherwise abrt is awesome, IMHO
>> the bugreports from abrt are much more useful than before :-)
>
> We discussed this issue at the Bugzappers meeting today. BZ would like
> to register that the high level of duplicates reported by abrt is a
> significant issue for triage work. We're not sure we can sustainably
> triage some major components (e.g. Firefox) if the current situation
> continues.
>
> We came up with several possible courses of action. First, we
> acknowledge that abrt team is working on improving duplicate detection,
> but Matej noted that this is intrinsically hard work and abrt will
> likely never be able to eliminate or even come close to eliminating
> duplicate reporting.

The algorithm for duplicate detection in the currently released version 
of ABRT is very rudimentary: it removes only the most obvious duplicates 
in simple programs. As far as I know it does not work for applications 
with variable number of threads (e.g. Firefox).

Fortunately now we have a new algorithm for duplicate detection which 
handles all the cases in a significantly better way. Most of the code is 
written, but it needs some testing before releasing. I guess it will 
take two weeks or so to finish it, and to make sure it works well.

An important attribute of the new algorithm is that it errs on the side 
of false duplicates. So it will much more often say some bug is a 
duplicate of another bug, even if sometimes it is not the case. It 
should make abrt bug flow sustainable, and than we can slowly improve 
the detection mechanism to be more accurate.

>
> Second, we wondered if abrt team might be able to assist in running any
> improved duplicate detection mechanisms over already-reported bugs in
> Bugzilla retrospectively. We will follow up with them about that.
>
> Third, we agreed to look at methods used in GNOME and other Bugzillas to
> cope with high levels of duplicate reporting from automated tools, such
> as extracting significant sections of tracebacks as bug comments to make
> manual duplicate detection faster and easier.

Good idea.

>
> Finally, we considered - and rather approved of - the proposal that's
> been floated on this list (and was floating in the meeting by Will
> Woods) to consider using the mechanism used by the kernel developers for
> kernel oopses: instead of being reported direct to Bugzilla, these are
> reported to an intermediate site (kerneloops.org) and can be promoted
> from there to Bugzilla if appropriate. Will is planning to work on this
> idea after finishing up some AutoQA work, and will talk to abrt team
> about it and see if they are interested in helping. He would welcome
> contact from anyone else who's interested in helping with that, too.

When the duplicate detection works, it would be a loss to not have the 
crashes directly in Bugzilla. I often see that the crashes reported by 
ABRT are located in the code and fixed.

If we fail to deliver better detection, then some intermediate site is 
certainly better target for thousands of duplicates than Bugzilla.

I would propose to create some intermediate site as a target for users 
who are not experienced enough to create an account in Bugzilla and to 
respond to questions, or they simply do not care. Then, it would be 
possible for them to report almost automatically, and we could get a lot 
of backtraces and support data that is currently lost. However, this 
must be thought out (security issue with backtraces).

>
> That's all, really - I just took an action item to pass on our thoughts
> about this :)
>

Best regards,
Karel