[Linux-cluster] Few queries about fence working

Wed Jan 25 15:00:22 UTC 2012

On Wed, 25 Jan 2012 19:27:28 +0530, "jayesh.shinde"
<jayesh.shinde at netcore.co.in> wrote:
> Hi Kaloyan Kovachev ,
> 
> I am using below config  in drbd.conf  which is mention on DRBD
cookbook.
> 
> }
>    disk {
>      fencing resource-and-stonith;
>    }
>    handlers {
>      outdate-peer "/sbin/obliterate";
> 
> Under  /sbin/obliterate script , "fence_node" is mention.
> 
> *Do you know what is the default method with "**fence_node $REMOTE" *i.e

> reboot of power-off ?
> 

It depends on the fence agent and/or what is configured in your
cluster.conf, but in most cases it should be reboot (in most cases
performed as off then on)
In this case (drbd is using cluster fencing) we are back to cluster.conf,
but you said cluster failed to fence the remote, so drbd would remain
blocking IO to the device and the cluster remains in inquorate state - no
services running
When the connection comes back there are several possible reasons
(split-brain is detected, waiting or a subsequent fence attempt is made)
both servers to try to fence each other at the same time where you end with
both servers down.

> Dear Digimer ,
> 
> Can you please guide me here.
> 
> Currently I am not having the test machine to test it , so all member's 

> inputs will help me a lot to understand it.
> 
> Below is the /sbin/obliterate
> 
> 
> #!/bin/bash
> # ###########################################################
> # DRBD 0.8.2.1 -> linux-cluster super-simple fencing wrapper
> #
> # Copyright Red Hat, 2007
> #
> # Licensed under the GNU General Public License version 2
> # which is incorporated herein by reference:
> #
> #   http://www.gnu.org/licenses/gpl-2.0.html
> #
> # At your option, you may choose to license this software
> # under any later version of the GNU General Public License.
> #
> # This software is distributed in the hopes that it will be
> # useful, but without warranty of any kind.
> #
> # Kills the other node in a 2-node cluster.  Only works with
> # 2-node clusters (FIXME?)
> #
> # ###########################################################
> #
> # Author: Lon Hohberger <lhh[a]redhat.com>
> #
> # Special thanks to fabioc on freenode
> #
> 
> PATH="/bin:/sbin:/usr/bin:/usr/sbin"
> 
> NODECOUNT=0
> LOCAL_ID=$(cman_tool status 2>/dev/null | grep '^Node ID:' | awk '{print

> $3}')
> REMOTE_ID=""
> REMOTE=""
> 
> if [ -z "$LOCAL_ID" ]; then
>          echo "Could not determine local node ID!"
>          exit 1
> fi
> 
> # Shoot the other guy.
> while read nid nodename; do
>          if [ "$nid" = "0" ]; then
>                  continue
>          fi
> 
>          ((NODECOUNT++))
> 
>          if [ "$nid" != "$LOCAL_ID" ]; then
>                  REMOTE_ID=$nid
>                  REMOTE=$nodename
>          fi
> done < <(cman_tool nodes 2>/dev/null | grep -v '^Node' | awk '{print 
> $1,$6}')
> 
> if [ $NODECOUNT -ne 2 ]; then
>          echo "Only works with 2 node clusters"
>          exit 1
> fi
> 
> if [ -z "$REMOTE_ID" ] || [ -z "$REMOTE" ]; then
>          echo "Could not determine remote node"
>          exit 1
> fi
> 
> echo "Local node ID: $LOCAL_ID"
> echo "Remote node ID: $REMOTE_ID"
> echo "Remote node: $REMOTE "
> 
> #
> # This could be cleaner by calling cman_tool kill -n <node>, but then we

> have
> # to poll/wait for fence status, and I don't feel like writing that
right
> # now.  Note that GFS *will* wait for this to occur, so if you're using
GFS
> # on DRBD, you still don't get access. ;)
> #
> fence_node $REMOTE
> 
> if [ $? -eq 0 ]; then
>          #
>          # Reference:
>          #
>         
http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html
>          #
>          # 7 = node got blown away.
>          #
>          exit 7
> fi
> 
> #
> # Fencing failed?!
> #
> exit 1
> 
> Regards
> Jayesh Shinde
> 
> 
> 
> 
> On 01/25/2012 04:02 PM, Kaloyan Kovachev wrote:
>>> <resources>
>>> <ip address="192.168.1.1" monitor_link="1"/>
>>> <fs device="/dev/drbd0" force_fsck="0" force_unmount="1" fsid="28418"
>>> fstype="ext3" mountpoint="/mount/path" name="imap1_fs" options="rw"
>>> self_fence="1"/>
>> You have self_fence, which should reboot the node instead of power off,
>> but as you are using drbd - the power off may be caused from drbd
instead
>> (check drbd.conf)
>>
>>> <script file="/etc/init.d/cyrus-imapd" name="imap1_init"/>
>>> </resources>
>> In either case if the remote node is not fenced it is safer to shutdown
>> instead of having the service run at both, so i wouldn't change
anything
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster