[Linux-cluster] Fencing problem w/ 2-node VM when a VM host dies

Kelvin Edmison kelvin.edmison at alcatel-lucent.com
Thu Dec 3 19:19:37 UTC 2015


I am hoping that someone can help me understand the problems I'm having 
with linux clustering for VMs.

I am clustering 2 VMs on two separate VM hosts, trying to ensure that a 
service is always available.  The hosts and guests are both RHEL 6.7. 
The goal is to have only one of the two VMs running at a time.

The configuration works when we test/simulate VM deaths and graceful VM 
host shutdowns, and administrative switchovers (i.e. clusvcadm -r ).

However, when we simulate the sudden isolation of host A (e.g. ifdown 
eth0), two things happen
1) the VM on host B does not start, and repeated fence_xvm errors appear 
in the logs on host B
2) when the 'failed' node is returned to service, the cman service on 
host B dies.

This is my cluster.conf file (some elisions re: hostnames)

<?xml version="1.0"?>
<cluster config_version="14" name="clustername">
     <fence_daemon/>
     <clusternodes>
         <clusternode name="hostA.fqdn" nodeid="1">
             <fence>
                 <method name="VmFence">
                     <device name="virtfence1" port="jobhistory"/>
                 </method>
             </fence>
         </clusternode>
         <clusternode name="hostB.fqdn" nodeid="2">
             <fence>
                 <method name="VmFence">
                     <device name="virtfence2" port="jobhistory"/>
                 </method>
             </fence>
         </clusternode>
     </clusternodes>
     <cman expected_votes="1" two_node="1"/>
     <fencedevices>
         <fencedevice agent="fence_xvm" 
key_file="/etc/cluster/fence_xvm_hostA.key" 
multicast_address="239.255.1.10" name="virtfence1"/>
         <fencedevice agent="fence_xvm" 
key_file="/etc/cluster/fence_xvm_hostB.key" 
multicast_address="239.255.2.10" name="virtfence2"/>
     </fencedevices>
     <rm>
         <failoverdomains/>
         <resources/>
         <vm autostart="1" name="jobhistory" recovery="restart" 
use_virsh="1"/>
     </rm>
     <logging/>
</cluster>


Thanks for any help you can offer,
   Kelvin Edmison




More information about the Linux-cluster mailing list