[Linux-cluster] Node is randomly fenced
lists at alteeve.ca
Wed Jun 4 15:13:15 UTC 2014
On 04/06/14 10:59 AM, Schaefer, Micah wrote:
> I have a 4 node cluster, running a single service group. I have been
> seeing node1 fence node3 while node3 is actively running the service group
> at random intervals.
> Rgmanager logs show no failures in service checks, and no other logs
> provide any useful information. How can I go about finding out why node1
> is fencing node3?
> I currently set up the failover domain to be restricted and not include
> cluster.conf : http://pastebin.com/xYy6xp6N
Random fencing is almost always caused by network failures. Can you look
are the system logs, starting a little before the fence and continuing
until after the fence completes, and paste them here? I suspect you will
see corosync complaining.
If this is true, do your switches support persistent multicast? Do you
use active/passive bonding? Have you tried different switch/cable/NIC?
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the Linux-cluster