[Linux-cluster] Few queries about fence working

Thu Jan 26 13:51:46 UTC 2012

On Thu, 26 Jan 2012 08:29:01 -0500, Digimer <linux at alteeve.com> wrote:
> On 01/26/2012 07:43 AM, jayesh.shinde at netcore.co.in wrote:
>> Dear Digimer & Kaloyan Kovachev ,
>> 
>> Do u think this server shutdown problem ( while fencing simultaneously
>> from both node via drbd.conf) can be completely avoid  if I use SAN
disk
>> instead of DRBD disk ?
>> 
>> i.e  in case of SAN disk the defined fence config under cluster.conf
>> will take care of the n/w failuer and related fencing of node ?
>> 
>> What you will suggect ,  SAN or DRBD disk.
>> please guide me.
>> 
>> Regards
>> Jayesh Shinde
> 
> It won't fundamentally remove the issue. Any time there is a break down
> in communication between nodes in a two-node cluster, there is going to
> be a simultaneous fence call made. Ideally, you would have a fence
> device that would not buffer calls, but that maybe not be feasible in
> your case.
> 
> This is why fence delays exist - specifically to allow one node to
> always complete a fence operation before another. If you really want to
> avoid having the same node survive a fence call in a split like this,
> then your best bet is to add a 3rd node for quorum. However, once you
> do, the obliterate fence handler will no longer work as it is restricted
> to 2 node clusters only (one of the things rhcs_fence resolves, but it
> isn't tested on EL5).
> 

There is also another/modified version of outdate_peer at
http://lists.linbit.com/pipermail/drbd-user/2011-October/016998.html
3-rd node for quorum is the best way to go and the script above does work
with more than 2 nodes (in my case 4)

> To be honest though, is there really a problem with having one node
> pre-defined to win a dual-fence call?

Adding a random sleep between 2 and 5 seconds can deal (most times) with
that too, but i think it is preferable to always have the same node win the
duel