[Linux-cluster] Node is randomly fenced

Schaefer, Micah Micah.Schaefer at jhuapl.edu
Wed Jun 11 18:55:07 UTC 2014


I have the issue on two of my nodes. Each node has 1ea 10gb connection. No
bonding, single link. What else can I look at? I manage the network too. I
don¹t see any link down notifications, don¹t see any errors on the ports.




On 6/11/14, 2:29 PM, "Digimer" <lists at alteeve.ca> wrote:

>On 11/06/14 02:21 PM, Schaefer, Micah wrote:
>> It failed again, even after deleting all the other failover domains.
>> 
>> Cluster conf
>> http://pastebin.com/jUXkwKS4
>> 
>> I turned corosync output to debug. How can I go about troubleshooting if
>> it really is a network issue or something else?
>> 
>> 
>> 
>> Jun 09 13:06:59 corosync [QUORUM] Members[4]: 1 2 3 4
>> Jun 11 14:10:17 corosync [TOTEM ] A processor failed, forming new
>> configuration.
>> Jun 11 14:10:29 corosync [QUORUM] Members[3]: 1 2 3
>> Jun 11 14:10:29 corosync [TOTEM ] A processor joined or left the
>> membership and a new membership was formed.
>> Jun 11 14:10:29 corosync [CPG   ] chosen downlist: sender r(0)
>> ip(10.70.100.101) ; members(old:4 left:1)
>
>This is, to me, *strongly* indicative of a network issue. It's not
>likely switch-wide as only one member was lost, but I would certainly
>put my money on a network problem somewhere, some how.
>
>Do you use bonding?
>
>-- 
>Digimer
>Papers and Projects: https://alteeve.ca/w/
>What if the cure for cancer is trapped in the mind of a person without
>access to education?
>
>-- 
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster





More information about the Linux-cluster mailing list