[Crash-utility] infinite loop in crash due to double-NMI on x86_64 system
Dave Anderson
anderson at redhat.com
Mon Jun 28 21:14:47 UTC 2010
----- "Lucas Silacci" <Lucas.Silacci at teradata.com> wrote:
> Sorry, guess I wasn't clear. Nobody hit the dump switch on these
> systems. They simply had multiple hardware errors that apparently
> triggered the NMI more than once. That's what I was trying to show with
> the SEL records, that the multiple NMIs were straight from hardware with
> no human intervention.
>
> The systems went through a panic (due to multiple NMIs),
That's what I'm trying to figure out -- when and how was it decided that
the machine should panic instead of continuing to handle the stream of NMIs?
In other words, this "dumpsw_notify" function -- why was it called?
> > PID: 0 TASK: ffffffff8038c340 CPU: 0 COMMAND: "swapper"
> > #0 [ffffffff8046dc50] machine_kexec at ffffffff8011a95b
> > #1 [ffffffff8046dd20] crash_kexec at ffffffff80154351
> > #2 [ffffffff8046dde0] panic at ffffffff801327fa
> > #3 [ffffffff8046ded0] dumpsw_notify at ffffffff8831c0c3
> > #4 [ffffffff8046dee0] notifier_call_chain at ffffffff8032481f
> > #5 [ffffffff8046df00] default_do_nmi at ffffffff80322fab
> > #6 [ffffffff8046df40] do_nmi at ffffffff80323365
> > #7 [ffffffff8046df50] nmi at ffffffff8032268f
> > [exception RIP: smp_send_stop+84]
> > RIP: ffffffff80116e44 RSP: ffffffff8046ddd8 RFLAGS: 00000246
> > RAX: 00000000000000ff RBX: ffffffff8831c1f8 RCX: 000041049c7256e8
> > RDX: 0000000000000005 RSI: 000000005238a938 RDI: 00000000002896a0
> > RBP: ffffffff8046df08 R8: 00000000000040fb R9: 000000005238a7e8
> > R10: 0000000000000002 R11: 0000ffff0000ffff R12: 000000000000000c
> > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > --- <NMI exception stack> ---
> > #8 [ffffffff8046ddd8] smp_send_stop at ffffffff80116e44
>From what you're implying, there is no physical "dump switch".
So I'm trying figure out where that "dumpsw_notify()" function
comes from? Whose module is that and what is its purpose?
Dave
> a reboot, and
> then crash was run on the resulting dump. In fact crash was
> automatically run via a startup script and there was no human
> intervention until after it was noticed that crash was filling up the
> root file system with a temporary file due to the inifinite loop.
More information about the Crash-utility
mailing list