[Linux-cluster] fencing for no reason that I can see
Terry
td3201 at gmail.com
Tue Sep 11 02:08:37 UTC 2012
On Mon, Sep 10, 2012 at 8:27 PM, Terry <td3201 at gmail.com> wrote:
> Hello,
>
> I have seen this a few times where one node stops seeing the other
> node for some unknown reason and fences it. Any idea how I can debug
> this? Here's from the node doing the fencing:
>
>
> Sep 10 19:01:23 omadvnfs01a corosync[10371]: [TOTEM ] A processor
> failed, forming new configuration.
> Sep 10 19:01:25 omadvnfs01a corosync[10371]: [QUORUM] Members[1]: 1
> Sep 10 19:01:25 omadvnfs01a corosync[10371]: [TOTEM ] A processor
> joined or left the membership and a new membership was formed.
> Sep 10 19:01:25 omadvnfs01a rgmanager[10692]: State change:
> omadvnfs01b.sec.jel.lc DOWN
> Sep 10 19:01:25 omadvnfs01a corosync[10371]: [CPG ] chosen
> downlist: sender r(0) ip(10.198.1.110) ; members(old:2 left:1)
> Sep 10 19:01:25 omadvnfs01a corosync[10371]: [MAIN ] Completed
> service synchronization, ready to provide service.
> Sep 10 19:01:25 omadvnfs01a fenced[10427]: fencing node omadvnfs01b.sec.jel.lc
>
>
> And here is from the fenced node:
>
> Sep 10 17:09:27 omadvnfs01b rpc.idmapd[6126]: nfsdcb:
> read(/proc/net/rpc/nfs4.idtoname/channel) failed: errno 0 (End of
> File)
> Sep 10 17:14:47 omadvnfs01b rpc.idmapd[6125]: nfsdcb:
> read(/proc/net/rpc/nfs4.idtoname/channel) failed: errno 0 (End of
> File)
> Sep 10 19:04:44 omadvnfs01b kernel: imklog 5.8.10, log source =
> /proc/kmsg started.
> Sep 10 19:04:44 omadvnfs01b rsyslogd: [origin software="rsyslogd"
> swVersion="5.8.10" x-pid="2379" x-info="http://www.rsyslog.com"] start
>
>
> I did notice that they were about 40 seconds off in time. I just
> fixed that but what else can I look for here. Our monitoring started
> noticing things at 19:02:30 that the fenced node was off the grid
> which is a little after it was fenced. What test is performed to see
> if the other node is up? How many times does it try?
>
> Thanks!
I guess I should have read the docs more thoroughly. Right from RHEL
6 cluster guide:
Ensure that exotic bond modes and VLAN tagging are not in use on
interfaces that the cluster uses for inter-node communication.
I am using a 3 interface 802.3ad link aggregate on the production
network. I could either use an iscsi interface or split one of the
three bond slave interfaces out and dedicate it to inter-node traffic.
I was also looking into a potential multicast issue but I believe my
switches support it fine (Foundry FLS). I wouldnt think it would be
intermittent like this. Anyone have any other thoughts?
More information about the Linux-cluster
mailing list