RE: [Linux-cluster] dlm_recvd + bnx2 oops
Kovacs, Corey J.
cjk at techma.com
Mon Nov 13 15:15:57 UTC 2006
Ok, that's sort of what I thought was going on but I wanted to get some
feedback. There is another bug in bugzilla that looks like it might be
related.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212055
Anyway, thanks
Corey
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Monday, November 13, 2006 9:14 AM
To: linux clustering
Subject: Re: [Linux-cluster] dlm_recvd + bnx2 oops
Kovacs, Corey J. wrote:
> Morning all. We've been experienceing regular cluster crashes on RHEL4u4.
> This system has 5 nodes and a few dozen nodes mounting shares via nfs.
> Periodically, nodes will panic, get fenced and all continues on. This
> system does have some of the HP Product Support Pack installed (not
> the HP bnx2 driver). Below is the section from the logs. It is hand
> typed but I am fairly sure it's accurrate.
>
> The machines are HP DL360-G5's. The nics are Broadcom NeXtreme II 5708's.
>
>
> Anyone else seeing this?
>
> Corey
>
> ===========================================================
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 000000ac printing eip:
> f8f339ae
> *pde = 37038001
> Oops: 0000 [#1]
> SMP
> Modules linked in: ipt_multiport iptable_nat ip_conntrack ip_tables
> ip_vs_rr ip_vs cpqci(U) ipmi_dev intf ipmi_si ipmi_msghandler xp(U)
> mptctl mptbase sg autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
> lock_harness(U) dlm(U) cman(U) md5 ipv6 nfsd exportfs lockd nfs_acl
> sunrpc joydev dm_mirror button battery ac ehci_hcd uhci_hcd bnx2 ext3
> jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx_conf(U) qla2xxx(U)
> cciss sd_mod scsi_mod
> CPU: 0
> EIP: 0060:[<f8f339ae>] Tainted: P VLI
> EFLAGS: 00010202 (2.6.9-42.0.2.ELsmp)
> EIP is at bnx2_tx_int+0x48/0x1d1 [bnx2]
> eax: f70620dc ebx: 00000ad7 ecx: 00000002 edx: 00000037
> esi: 00000a37 edi: 00000000 ebp: f6a0b200 esp: c03cefa0
> ds: 007b es: 007b ss: 0068
> Process dlm_recvd (pid: 3973, threadinfo=c03ce000 task=f71652f0)
> Stack: f70620dc 00000037 f5c19000 00000000 f6a0b200 f6a0afc0 c03cefd4
> f8f3431d
> 00000000 f6a0afc0 c201fd80 15a3182b c0280e24 000493dc 00000001
> c0392c18
> 0000000a 00000000 c01269b8 f59d4dc4 00000046 c038b900 f59d4000
> c010819f Call trace:
> [<f8f3431d>] bnx2_poll+0x4f/0x142 [bnx2] [<c0280e24>]
> net_rx_action+0xae/0x160 [<c01269b8>] __do_softirq+0x4c/0xb1
> [<c010819f>] do_softirq+0x4f/0x56
That looks like a driver crash to me. The fact that it's in dlm_recvd is
probably just that it's a busy process doing lots of network IO. There's no
DLM code in the stacktrace at all
--
patrick
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list