[Linux-cluster] Re: [RFC] DRBD + GFS cookbook (Lon Hohberger)

Thu Dec 13 04:37:07 UTC 2007

>> when this handler gets called both nodes will try to fence each
 other.. Is that the intended effect?

>Yes, in a network partition of a two-node cluster, both nodes will race
>to fence.  One wins, the other dies. ;)

OK.

>> b) If we try to do ssh <host> -c "drbdadm outdate all",  gfs is still
 mounted on top of drbd and drbd is primary so here is no effect of the
 command and >> the split brain continues. I have seen this.

>... but with resource-and-stonith, drbd freezes I/O until the
>outdate-peer script returns a 4 or 7...  If it doesn't return

Could you please explain ... If it doesn't return?

> Fail over an xdmcp session?  I think xdm/gdm/etc. were not designed to
> handle that sort of a failure case.  It sounds like a cool idea, but I
> would not even know where to begin to make that work.

Well I could keep asking the question around and may be someday somebody will have an idea.

Also could you please explain following from your obliterate script..

<quote>
# now.  Note that GFS *will* wait for this to occur, so if you're using GFS
# on DRBD, you still don't get access. ;)

</quote>

What will GFS wait for? Fence status? 

Also since our APC masterswitch hasn't arrived yet, I modified the obliterate script to use ssh to do the dirty work.
( instead of using RHCS fencing i.e., also as I have a 3 node cluster.. Also I defined REMOTE manually on each of the two drbd nodes )
Here it is. Please comment if it will work?

#!/bin/bash
# ###########################################################
# DRBD 0.8.2.1 -> linux-cluster super-simple fencing wrapper
#
# Kills the other node in a 2-node cluster.  Only works with
# 2-node clusters (FIXME?)
#
# ###########################################################
#
# Author: Lon Hohberger <lhh[a]redhat.com>
#
# Special thanks to fabioc on freenode
#

PATH="/bin:/sbin:/usr/bin:/usr/sbin"

NODECOUNT=0
LOCAL_ID="2"
REMOTE_ID="1"
REMOTE="imstermserver1"

echo "Local node ID: $LOCAL_ID"
echo "Remote node ID: $REMOTE_ID"
echo "Remote node: $REMOTE "

#
# This could be cleaner by calling cman_tool kill -n <node>, but then we have
# to poll/wait for fence status, and I don't feel like writing that right
# now.  Note that GFS *will* wait for this to occur, so if you're using GFS
# on DRBD, you still don't get access. ;)
#

#fence_node $REMOTE

logger -f /var/log/messages "$0 : Fencing Node : $REMOTE"

ssh $REMOTE drbdadm outdate all
if [ $? -eq 0 ]; then
logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE succeded"
        #
        # Reference:
        # http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html
        #
        # 4 = -> peer is outdated (this handler outdated it) [ resource fencing ]
        #
        ssh $REMOTE drbdadm resume-io all
        exit 4
fi
logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE FAILED!!"
ssh $REMOTE poweroff -f
if [ $? -eq 0 ]; then
logger -f /var/log/messages "$0 : poweroff -f on $REMOTE succeded"
        #
        # Reference:
        # http://osdir..com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html
        #
        # 7 = node got blown away.
        #
        ssh $REMOTE drbdadm resume-io all
        exit 7
fi

logger -f /var/log/messages "$0 : poweroff -f on $REMOTE FAILED!!"
#
# Fencing failed?!
#

logger -f /var/log/messages "$0 : FENCING on $REMOTE FAILED!!"

# Go along with split brain..

ssh $REMOTE drbdadm resume-io all 
drbdadm resume-io all

exit 1

Regards
Koustubha Kale

      Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php