RE: [Linux-cluster] dlm_recvd + bnx2 oops

Mon Nov 13 15:15:57 UTC 2006

Ok, that's sort of what I thought was going on but I wanted to get some
feedback. There is another bug in bugzilla that looks like it might be
related.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212055

Anyway, thanks

Corey

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Patrick Caulfield
Sent: Monday, November 13, 2006 9:14 AM
To: linux clustering
Subject: Re: [Linux-cluster] dlm_recvd + bnx2 oops

Kovacs, Corey J. wrote:
> Morning all. We've been experienceing regular cluster crashes on RHEL4u4.
> This system has 5 nodes and a few dozen nodes mounting shares via nfs.
> Periodically, nodes will panic, get fenced and all continues on. This 
> system does have some of the HP Product Support Pack installed (not 
> the HP bnx2 driver). Below is the section from the logs. It is hand 
> typed but I am fairly sure it's accurrate.
> 
> The machines are  HP DL360-G5's. The nics are Broadcom NeXtreme II 5708's.
> 
> 
> Anyone else seeing this?
> 
> Corey
> 
> ===========================================================
> 
> Unable to handle kernel NULL pointer dereference at virtual address 
> 000000ac printing eip:
> f8f339ae
> *pde = 37038001
> Oops: 0000 [#1]
> SMP
> Modules linked in: ipt_multiport iptable_nat ip_conntrack ip_tables 
> ip_vs_rr ip_vs cpqci(U) ipmi_dev intf ipmi_si ipmi_msghandler xp(U) 
> mptctl mptbase sg autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
> lock_harness(U) dlm(U) cman(U) md5 ipv6 nfsd exportfs lockd nfs_acl 
> sunrpc joydev dm_mirror button battery ac ehci_hcd uhci_hcd bnx2 ext3 
> jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx_conf(U) qla2xxx(U) 
> cciss sd_mod scsi_mod
> CPU:    0
> EIP:    0060:[<f8f339ae>]     Tainted: P    VLI
> EFLAGS: 00010202    (2.6.9-42.0.2.ELsmp)
> EIP is at bnx2_tx_int+0x48/0x1d1 [bnx2]
> eax: f70620dc   ebx:  00000ad7   ecx:  00000002   edx:  00000037
> esi: 00000a37   edi:  00000000   ebp:  f6a0b200   esp:  c03cefa0
> ds:  007b    es: 007b   ss: 0068
> Process dlm_recvd (pid: 3973, threadinfo=c03ce000 task=f71652f0)
> Stack: f70620dc 00000037 f5c19000 00000000 f6a0b200 f6a0afc0 c03cefd4 
> f8f3431d
>        00000000 f6a0afc0 c201fd80 15a3182b c0280e24 000493dc 00000001
> c0392c18
>        0000000a 00000000 c01269b8 f59d4dc4 00000046 c038b900 f59d4000 
> c010819f Call trace:
>  [<f8f3431d>] bnx2_poll+0x4f/0x142 [bnx2]  [<c0280e24>] 
> net_rx_action+0xae/0x160  [<c01269b8>] __do_softirq+0x4c/0xb1  
> [<c010819f>] do_softirq+0x4f/0x56

That looks like a driver crash to me. The fact that it's in dlm_recvd is
probably just that it's a busy process doing lots of network IO. There's no
DLM code in the stacktrace at all

-- 

patrick

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster