[Linux-cluster] Node is randomly fenced

Digimer lists at alteeve.ca
Wed Jun 4 15:13:15 UTC 2014


On 04/06/14 10:59 AM, Schaefer, Micah wrote:
> I have a 4 node cluster, running a single service group. I have been
> seeing node1 fence node3 while node3 is actively running the service group
> at random intervals.
>
> Rgmanager logs show no failures in service checks, and no other logs
> provide any useful information. How can I go about finding out why node1
> is fencing node3?
>
> I currently set up the failover domain to be restricted and not include
> node3.
>
> cluster.conf : http://pastebin.com/xYy6xp6N

Random fencing is almost always caused by network failures. Can you look 
are the system logs, starting a little before the fence and continuing 
until after the fence completes, and paste them here? I suspect you will 
see corosync complaining.

If this is true, do your switches support persistent multicast? Do you 
use active/passive bonding? Have you tried different switch/cable/NIC?

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list