[Linux-cluster] Node is randomly fenced
lists at alteeve.ca
Wed Jun 11 19:28:28 UTC 2014
The first thing I would do is get a second NIC and configure
active-passive bonding. network issues are too common to ignore in HA
setups. Ideally, I would span the links across separate stacked switches.
As for debugging the issue, I can only recommend to look closely at the
system and switch logs for clues.
On 11/06/14 02:55 PM, Schaefer, Micah wrote:
> I have the issue on two of my nodes. Each node has 1ea 10gb connection. No
> bonding, single link. What else can I look at? I manage the network too. I
> don¹t see any link down notifications, don¹t see any errors on the ports.
> On 6/11/14, 2:29 PM, "Digimer" <lists at alteeve.ca> wrote:
>> On 11/06/14 02:21 PM, Schaefer, Micah wrote:
>>> It failed again, even after deleting all the other failover domains.
>>> Cluster conf
>>> I turned corosync output to debug. How can I go about troubleshooting if
>>> it really is a network issue or something else?
>>> Jun 09 13:06:59 corosync [QUORUM] Members: 1 2 3 4
>>> Jun 11 14:10:17 corosync [TOTEM ] A processor failed, forming new
>>> Jun 11 14:10:29 corosync [QUORUM] Members: 1 2 3
>>> Jun 11 14:10:29 corosync [TOTEM ] A processor joined or left the
>>> membership and a new membership was formed.
>>> Jun 11 14:10:29 corosync [CPG ] chosen downlist: sender r(0)
>>> ip(10.70.100.101) ; members(old:4 left:1)
>> This is, to me, *strongly* indicative of a network issue. It's not
>> likely switch-wide as only one member was lost, but I would certainly
>> put my money on a network problem somewhere, some how.
>> Do you use bonding?
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster