[Linux-cluster] Node is randomly fenced
lists at alteeve.ca
Thu Jun 12 17:08:07 UTC 2014
On 12/06/14 12:48 PM, Schaefer, Micah wrote:
> As far as the switch goes, both are Cisco Catalyst 6509-E, no spanning
> tree changes are happening and all the ports have port-fast enabled for
> these servers. My switch logging level is very high and I have no messages
> in relation to the time frames or ports.
> TOTEM reports that “A processor joined or left the membership…”, but that
> isn’t enough detail.
> Also note that I did not have these issues until adding new servers: node3
> and node4 to the cluster. Node1 and node2 do not fence each other (unless
> a real issue is there), and they are on different switches.
Then I can't imagine it being network anymore. Seeing as both node 3 and
4 get fenced, it's likely not hardware either. Are the workloads on 3
and 4 much higher (or are the computers much slower) than 1 and 2? I'm
wondering if the nodes are simply not keeping up with corosync traffic.
You might try adjusting the corosync token timeout and retransmit counts
to see if that reduces the node loses.
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster