[Crash-utility] infinite loop in crash due to double-NMI on x86_64 system

Dave Anderson anderson at redhat.com
Mon Jun 28 21:14:47 UTC 2010


----- "Lucas Silacci" <Lucas.Silacci at teradata.com> wrote:

 
> Sorry, guess I wasn't clear. Nobody hit the dump switch on these
> systems. They simply had multiple hardware errors that apparently
> triggered the NMI more than once. That's what I was trying to show with
> the SEL records, that the multiple NMIs were straight from hardware with
> no human intervention.
> 
> The systems went through a panic (due to multiple NMIs), 

That's what I'm trying to figure out -- when and how was it decided that
the machine should panic instead of continuing to handle the stream of NMIs?

In other words, this "dumpsw_notify" function -- why was it called?

> > PID: 0      TASK: ffffffff8038c340  CPU: 0   COMMAND: "swapper"
> >  #0 [ffffffff8046dc50] machine_kexec at ffffffff8011a95b
> >  #1 [ffffffff8046dd20] crash_kexec at ffffffff80154351
> >  #2 [ffffffff8046dde0] panic at ffffffff801327fa
> >  #3 [ffffffff8046ded0] dumpsw_notify at ffffffff8831c0c3
> >  #4 [ffffffff8046dee0] notifier_call_chain at ffffffff8032481f
> >  #5 [ffffffff8046df00] default_do_nmi at ffffffff80322fab
> >  #6 [ffffffff8046df40] do_nmi at ffffffff80323365
> >  #7 [ffffffff8046df50] nmi at ffffffff8032268f
> >     [exception RIP: smp_send_stop+84]
> >     RIP: ffffffff80116e44  RSP: ffffffff8046ddd8  RFLAGS: 00000246
> >     RAX: 00000000000000ff  RBX: ffffffff8831c1f8  RCX: 000041049c7256e8
> >     RDX: 0000000000000005  RSI: 000000005238a938  RDI: 00000000002896a0
> >     RBP: ffffffff8046df08   R8: 00000000000040fb   R9: 000000005238a7e8
> >     R10: 0000000000000002  R11: 0000ffff0000ffff  R12: 000000000000000c
> >     R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
> >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > --- <NMI exception stack> ---
> >  #8 [ffffffff8046ddd8] smp_send_stop at ffffffff80116e44

>From what you're implying, there is no physical "dump switch".
So I'm trying figure out where that "dumpsw_notify()" function
comes from?  Whose module is that and what is its purpose? 

Dave
 

> a reboot, and
> then crash was run on the resulting dump. In fact crash was
> automatically run via a startup script and there was no human
> intervention until after it was noticed that crash was filling up the
> root file system with a temporary file due to the inifinite loop.




More information about the Crash-utility mailing list