[Crash-utility] [PATCH V2] take Hardware Error & kernel pointer bug as separate panicmsg
Dave Anderson
anderson at redhat.com
Thu Feb 5 14:31:48 UTC 2015
----- Original Message -----
> There are just too many kinds of panic types are categorized under
> the same Oops: xxxx, makes this field really ambiguous and not so useful
>
> PANIC: "Oops: 0000 [#1] SMP " (check log for details)
>
> this patch separated 3 kinds of panicmsg out, as the most happening cases
> among the machines managed by me; the match string are copied
> from kernel source code exactly, after applied, I got panicmsg like:
>
> include/linux/kernel.h:#define HW_ERR
> panicmsg: "[Hardware Error]: CPU 7: Machine Check Exception: 5 Bank
> 11: f200003f000100b2"
> drivers/char/sysrq.c:__handle_sysrq
> panicmsg: "SysRq : Trigger a crash"
> arch/x86/kernel/traps.c:do_general_protection
> panicmsg: "general protection fault: 8800 [#1] SMP"
> arch/x86/mm/fault.c:show_fault_oops
> panicmsg: "BUG: unable to handle kernel paging request at
> 00001248a68eb328"
>
> We need to move the SysRq matching lines to before matching "Oops", because
> SysRq lines usually also has the Oops, need to take precedence for SysRq.
>
> Signed-off-by: Derek Che <drc at yahoo-inc.com>
Hi Derek,
As I mentioned earlier, in addition to checking for the general
protection faults, in my testing I found several other instances
where the "Oops" message could be replaced with the more meaningful
messages that preceded it, such as double faults, divide errors,
stack segment faults, "Kernel BUG" (with a capital K), "Unable to
handle kernel ..." (with a capital U), etc. I also added a few
break instructions after a search-for message was found instead
of continuing to parse the kernel log.
However, the machine check string search does follow the "kernel panic - "
check, which I understand you would prefer to be the opposite. The
fatal error string searches that are being made come from from die()
calls, or from other message sources that are part of the kernel crash
sequence. On the other hand, the machine check messages are generated
from a stream of pr_emerg(HW_ERR) calls, and are not necessarily
(although likely) crash precedents. But since the kernel panic
message does contain the "Fatal machine check" message, the reason
behind the crash is readily evident.
I appreciate your getting the ball rolling here, as it was certainly
due for an update/improvement.
Queued for crash-7.1.0:
https://github.com/crash-utility/crash/commit/c3840016bf1770b6b1cf571202f2c554fcd1cf55
Thanks,
Dave
More information about the Crash-utility
mailing list