[Crash-utility] infinite loop in crash due to double-NMI on x86_64 system

Tue Jun 29 17:37:37 UTC 2010

----- "Lucas Silacci" <Lucas.Silacci at teradata.com> wrote:

> My only guess is that there is something in the transition between the
> regular kernel and the kdump kernel (somewhere in the kexec path) that
> re-opens the door for a queued up NMI to come in just before the kdump
> kernel takes over. I've been digging through that code, but so far
> haven't come up with anything that explains it yet.

Right -- I'm wondering who called smp_send_stop() while it was running 
on the NMI exception stack?

> PID: 0      TASK: ffffffff8038c340  CPU: 0   COMMAND: "swapper"
>  #0 [ffffffff8046dc50] machine_kexec at ffffffff8011a95b
>  #1 [ffffffff8046dd20] crash_kexec at ffffffff80154351
>  #2 [ffffffff8046dde0] panic at ffffffff801327fa
>  #3 [ffffffff8046ded0] dumpsw_notify at ffffffff8831c0c3
>  #4 [ffffffff8046dee0] notifier_call_chain at ffffffff8032481f
>  #5 [ffffffff8046df00] default_do_nmi at ffffffff80322fab
>  #6 [ffffffff8046df40] do_nmi at ffffffff80323365
>  #7 [ffffffff8046df50] nmi at ffffffff8032268f
>     [exception RIP: smp_send_stop+84]
>     RIP: ffffffff80116e44  RSP: ffffffff8046ddd8  RFLAGS: 00000246
> > >     RAX: 00000000000000ff  RBX: ffffffff8831c1f8  RCX: 000041049c7256e8
> > >     RDX: 0000000000000005  RSI: 000000005238a938  RDI: 00000000002896a0
> > >     RBP: ffffffff8046df08   R8: 00000000000040fb   R9: 000000005238a7e8
> > >     R10: 0000000000000002  R11: 0000ffff0000ffff  R12: 000000000000000c
> > >     R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
> > >     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> > > --- <NMI exception stack> ---
> > >  #8 [ffffffff8046ddd8] smp_send_stop at ffffffff80116e44