[Linux-cluster] Re: [RFC] DRBD + GFS cookbook (Lon Hohberger)
Koustubha Kale
koustubha_kale at yahoo.com
Thu Dec 13 04:37:07 UTC 2007
>> when this handler gets called both nodes will try to fence each
other.. Is that the intended effect?
>Yes, in a network partition of a two-node cluster, both nodes will race
>to fence. One wins, the other dies. ;)
OK.
>> b) If we try to do ssh <host> -c "drbdadm outdate all", gfs is still
mounted on top of drbd and drbd is primary so here is no effect of the
command and >> the split brain continues. I have seen this.
>... but with resource-and-stonith, drbd freezes I/O until the
>outdate-peer script returns a 4 or 7... If it doesn't return
Could you please explain ... If it doesn't return?
> Fail over an xdmcp session? I think xdm/gdm/etc. were not designed to
> handle that sort of a failure case. It sounds like a cool idea, but I
> would not even know where to begin to make that work.
Well I could keep asking the question around and may be someday somebody will have an idea.
Also could you please explain following from your obliterate script..
<quote>
# now. Note that GFS *will* wait for this to occur, so if you're using GFS
# on DRBD, you still don't get access. ;)
</quote>
What will GFS wait for? Fence status?
Also since our APC masterswitch hasn't arrived yet, I modified the obliterate script to use ssh to do the dirty work.
( instead of using RHCS fencing i.e., also as I have a 3 node cluster.. Also I defined REMOTE manually on each of the two drbd nodes )
Here it is. Please comment if it will work?
#!/bin/bash
# ###########################################################
# DRBD 0.8.2.1 -> linux-cluster super-simple fencing wrapper
#
# Kills the other node in a 2-node cluster. Only works with
# 2-node clusters (FIXME?)
#
# ###########################################################
#
# Author: Lon Hohberger <lhh[a]redhat.com>
#
# Special thanks to fabioc on freenode
#
PATH="/bin:/sbin:/usr/bin:/usr/sbin"
NODECOUNT=0
LOCAL_ID="2"
REMOTE_ID="1"
REMOTE="imstermserver1"
echo "Local node ID: $LOCAL_ID"
echo "Remote node ID: $REMOTE_ID"
echo "Remote node: $REMOTE "
#
# This could be cleaner by calling cman_tool kill -n <node>, but then we have
# to poll/wait for fence status, and I don't feel like writing that right
# now. Note that GFS *will* wait for this to occur, so if you're using GFS
# on DRBD, you still don't get access. ;)
#
#fence_node $REMOTE
logger -f /var/log/messages "$0 : Fencing Node : $REMOTE"
ssh $REMOTE drbdadm outdate all
if [ $? -eq 0 ]; then
logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE succeded"
#
# Reference:
# http://osdir.com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html
#
# 4 = -> peer is outdated (this handler outdated it) [ resource fencing ]
#
ssh $REMOTE drbdadm resume-io all
exit 4
fi
logger -f /var/log/messages "$0 : drbdadm outdate all on $REMOTE FAILED!!"
ssh $REMOTE poweroff -f
if [ $? -eq 0 ]; then
logger -f /var/log/messages "$0 : poweroff -f on $REMOTE succeded"
#
# Reference:
# http://osdir..com/ml/linux.kernel.drbd.devel/2006-11/msg00005.html
#
# 7 = node got blown away.
#
ssh $REMOTE drbdadm resume-io all
exit 7
fi
logger -f /var/log/messages "$0 : poweroff -f on $REMOTE FAILED!!"
#
# Fencing failed?!
#
logger -f /var/log/messages "$0 : FENCING on $REMOTE FAILED!!"
# Go along with split brain..
ssh $REMOTE drbdadm resume-io all
drbdadm resume-io all
exit 1
Regards
Koustubha Kale
Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php
More information about the Linux-cluster
mailing list