[Linux-cluster] dlm_recvd + bnx2 oops
Patrick Caulfield
pcaulfie at redhat.com
Mon Nov 13 14:13:40 UTC 2006
Kovacs, Corey J. wrote:
> Morning all. We've been experienceing regular cluster crashes on RHEL4u4.
> This system has 5 nodes and a few dozen nodes mounting shares via nfs.
> Periodically, nodes will panic, get fenced and all continues on. This
> system
> does have some of the HP Product Support Pack installed (not the HP bnx2
> driver). Below is the section from the logs. It is hand typed but I am
> fairly sure
> it's accurrate.
>
> The machines are HP DL360-G5's. The nics are Broadcom NeXtreme II 5708's.
>
>
> Anyone else seeing this?
>
> Corey
>
> ===========================================================
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 000000ac
> printing eip:
> f8f339ae
> *pde = 37038001
> Oops: 0000 [#1]
> SMP
> Modules linked in: ipt_multiport iptable_nat ip_conntrack ip_tables
> ip_vs_rr ip_vs cpqci(U) ipmi_dev intf ipmi_si ipmi_msghandler xp(U)
> mptctl mptbase sg autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
> lock_harness(U) dlm(U) cman(U) md5 ipv6 nfsd exportfs lockd nfs_acl
> sunrpc joydev dm_mirror button battery ac ehci_hcd uhci_hcd bnx2 ext3
> jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx_conf(U) qla2xxx(U)
> cciss sd_mod scsi_mod
> CPU: 0
> EIP: 0060:[<f8f339ae>] Tainted: P VLI
> EFLAGS: 00010202 (2.6.9-42.0.2.ELsmp)
> EIP is at bnx2_tx_int+0x48/0x1d1 [bnx2]
> eax: f70620dc ebx: 00000ad7 ecx: 00000002 edx: 00000037
> esi: 00000a37 edi: 00000000 ebp: f6a0b200 esp: c03cefa0
> ds: 007b es: 007b ss: 0068
> Process dlm_recvd (pid: 3973, threadinfo=c03ce000 task=f71652f0)
> Stack: f70620dc 00000037 f5c19000 00000000 f6a0b200 f6a0afc0 c03cefd4
> f8f3431d
> 00000000 f6a0afc0 c201fd80 15a3182b c0280e24 000493dc 00000001
> c0392c18
> 0000000a 00000000 c01269b8 f59d4dc4 00000046 c038b900 f59d4000
> c010819f
> Call trace:
> [<f8f3431d>] bnx2_poll+0x4f/0x142 [bnx2]
> [<c0280e24>] net_rx_action+0xae/0x160
> [<c01269b8>] __do_softirq+0x4c/0xb1
> [<c010819f>] do_softirq+0x4f/0x56
That looks like a driver crash to me. The fact that it's in dlm_recvd is probably just that
it's a busy process doing lots of network IO. There's no DLM code in the stacktrace at all
--
patrick
More information about the Linux-cluster
mailing list