[Linux-cluster] Few queries about fence working

Thu Jan 26 13:43:16 UTC 2012

On Thu, 26 Jan 2012 18:13:53 +0530, jayesh.shinde at netcore.co.in wrote:
> Dear Digimer & Kaloyan Kovachev ,
> 
> Do u think this server shutdown problem ( while fencing simultaneously
> from both node via drbd.conf) can be completely avoid  if I use SAN disk
> instead of DRBD disk ?
> 
> i.e  in case of SAN disk the defined fence config under cluster.conf
will
> take care of the n/w failuer and related fencing of node ?
> 
> What you will suggect ,  SAN or DRBD disk.
> please guide me.

It has nothing to do with SAN and DRBD - it is the cluster software, which
takes care of the fencing.
There is another mail thread here "Halt nodes in cluster with cable
disconnect" going at the same time about the same problem

> 
> Regards
> Jayesh Shinde
> 
> Quoting Digimer <linux at alteeve.com>:
> 
>> On 01/25/2012 08:57 AM, jayesh.shinde wrote:
>>> Hi Kaloyan Kovachev ,
>>>
>>> I am using below config  in drbd.conf  which is mention on DRBD
>>> cookbook.
>>>
>>> }
>>>   disk {
>>>     fencing resource-and-stonith;
>>>   }
>>>   handlers {
>>>     outdate-peer "/sbin/obliterate";
>>>
>>> Under  /sbin/obliterate script , "fence_node" is mention.
>>>
>>> *Do you know what is the default method with "**fence_node $REMOTE"
*i.e
>>> reboot of power-off ?
>>>
>>> Dear Digimer ,
>>>
>>> Can you please guide me here.
>>>
>>> Currently I am not having the test machine to test it , so all
member's
>>> inputs will help me a lot to understand it.
>>>
>>> Below is the /sbin/obliterate
>>
>> I updated the tutorial to address this last night;
>>
>>
https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Hooking_DRBD_Into_The_Cluster.27s_Fencing
>>
>> and
>>
>>
https://alteeve.com/w/2-Node_Red_Hat_KVM_Cluster_Tutorial#Configuring_DRBD_Global_and_Common_Options
>>
>> In short; this is a problem where the fence device, IPMI and DRAC here,
>> get the call to shut down their host but don't act on it fast enough to
>> block the call heading to the other node.
>>
>> The obliterate scripts (obliterate is an older version of
>> obliterate-peer.sh, which I am working to replace with rhcs_fence now)
>> call cman to remove the peer node from the cluster, then call the
actual
>> fence. For this reason, the delay set in cluster.conf won't help.
>>
>> The options are to add a 'sleep 10;' to the start of *one* node's
>> obliterate or obliterate-peer.sh script. Alternatively, rhcs_fence uses
>> the node's ID to calculate a delay automatically to help avoid these
>> dual-fence scenarios.
>>
>> --
>> Digimer
>> E-Mail:              digimer at alteeve.com
>> Papers and Projects: https://alteeve.com
>>