[Linux-cluster] Fence methods

Thu Sep 6 20:45:33 UTC 2012

Now that ricci is figured out, I am having some issues with fencing.

It seems VMWare Fence works very well, but our GFS2 volume is not available until it receives a "success" status.  This gives us maybe 30-60 seconds of time where we cannot access the GFS2 volumes which equates to downtime. SCSI Fencing seems faster, but very unreliable. If I try to fence a node, it will return "fence somenode success". Great. But the node can still access the GFS2 volume.

Then I am also seeing conflicting information on using Qdisk with fence_scsi as it seems to be a no-no. I could swear I saw a note somewhere that Qdisk and fence_scsi worked together in newer versions of RHEL.

So what is my best bet in making sure GFS2 is as available as possible in the case of a node failure… or simply rebooting a node to apply say a software patch which is an even bigger concern?

Cluster.conf as it stands now:

<?xml version="1.0"?>
<cluster config_version="34" name="Xanadu">
<clusternodes>
<clusternode name="xanadunode1" nodeid="1">
<fence>
<method name="Method2">
<device name="SCSI_Fence"/>
</method>
</fence>
<unfence>
<device action="on" name="SCSI_Fence"/>
</unfence>
</clusternode>
<clusternode name="xanadunode2" nodeid="2">
<fence>
<method name="Method2">
<device name="SCSI_Fence"/>
</method>
</fence>
<unfence>
<device action="on" name="SCSI_Fence"/>
</unfence>
</clusternode>
<clusternode name="xanadunode3" nodeid="3">
<fence>
<method name="Method2">
<device name="SCSI_Fence"/>
</method>
</fence>
<unfence>
<device action="on" name="SCSI_Fence"/>
</unfence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_vmware_soap" ipaddr="vsphere.innova.local" login="vmwarefence" name="VMWare_Fence" passwd="XXXXXXXX"/>
<fencedevice agent="fence_scsi" name="SCSI_Fence"/>
</fencedevices>
<cman expected_votes="5"/>
<quorumd label="quorum"/>
<rm>
<failoverdomains>
<failoverdomain name="Cluster Management">
<failoverdomainnode name="xanadunode1"/>
<failoverdomainnode name="xanadunode2"/>
<failoverdomainnode name="xanadunode3"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="192.168.30.78" sleeptime="2"/>
</resources>
<service domain="Cluster Management" name="Cluster Management" recovery="relocate">
<ip ref="192.168.30.78"/>
</service>
</rm>
</cluster>

________________________________________
Chip Burke

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20120906/1cba5546/attachment.htm>