[Linux-cluster] Node is randomly fenced

Digimer lists at alteeve.ca
Wed Jun 11 18:29:30 UTC 2014


On 11/06/14 02:21 PM, Schaefer, Micah wrote:
> It failed again, even after deleting all the other failover domains.
> 
> Cluster conf
> http://pastebin.com/jUXkwKS4
> 
> I turned corosync output to debug. How can I go about troubleshooting if
> it really is a network issue or something else?
> 
> 
> 
> Jun 09 13:06:59 corosync [QUORUM] Members[4]: 1 2 3 4
> Jun 11 14:10:17 corosync [TOTEM ] A processor failed, forming new
> configuration.
> Jun 11 14:10:29 corosync [QUORUM] Members[3]: 1 2 3
> Jun 11 14:10:29 corosync [TOTEM ] A processor joined or left the
> membership and a new membership was formed.
> Jun 11 14:10:29 corosync [CPG   ] chosen downlist: sender r(0)
> ip(10.70.100.101) ; members(old:4 left:1)

This is, to me, *strongly* indicative of a network issue. It's not
likely switch-wide as only one member was lost, but I would certainly
put my money on a network problem somewhere, some how.

Do you use bonding?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?




More information about the Linux-cluster mailing list