[Linux-cluster] Node is randomly fenced
lists at alteeve.ca
Thu Jun 12 16:36:12 UTC 2014
On 12/06/14 12:33 PM, yvette hirth wrote:
> On 06/12/2014 08:32 AM, Schaefer, Micah wrote:
>> Yesterday I added bonds on nodes 3 and 4. Today, node4 was active and
>> fenced, then node3 was fenced when node4 came back online. The network
>> topology is as follows:
>> switch1: node1, node3 (two connections)
>> switch2: node2, node4 (two connections)
>> switch1 <―> switch2
>> All on the same subnet
>> I set up monitoring at 100 millisecond of the nics in active-backup mode,
>> and saw no messages about link problems before the fence.
>> I see multicast between the servers using tcpdump.
>> Any more ideas?
> spanning-tree scans/rebuilds happen on 10Gb circuits just like they do
> on 1Gb circuits, and when they happen, traffic on the switches *can*
> come to a grinding halt, depending upon the switch firmware and the type
> of spanning-tree scan/rebuild being done.
> you may want to check your switch logs to see if any spanning-tree
> rebuilds were being done at the time of the fence.
> just an idea, and hth
> yvette hirth
When I've seen this (I now disable STP entirely), it blocks all traffic
so I would expect multiple/all nodes to partition off on their own.
Still, worth looking into. :)
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster