[Linux-cluster] fence_sanbox2 configuration?

Wed Jun 29 19:59:45 UTC 2005

On 29, Jun, 2005, David Teigland declared:

> On Tue, Jun 28, 2005 at 01:33:38PM -0400, Dan B. Phung wrote:
> > I've been trying to configure my gfs cluster to use the fence_sanbox2, and
> > don't quite have it right.  my configuration is pasted below.  To test it,
> > I reboot one of the machines in the cluster, but instead of getting fenced
> > and the journal replayed, the rest of the cluster just hangs (errors
> > below).  In addition, when I telnet to the fibre switch, the port is not
> > disabled as I thought it would be.  I'm able to use fence_sanbox2 directly
> > to disable and enable the port.  So...does anybody see what I'm doing
> > wrong?  How can I debug this further?  Does anything special need to be
> > done to the init scripts to disable/reenable the port, or does the fenced
> > take care of that?
> > 
> > I'm using the RHEL4 branch, which I cvs updated a couple weeks back.

I can't recreate the error currently because my volume somehow disappeared
after my last test.  I'm trying to do vgcfgrestore, but to no avail.  
Here's the output of things before rebooting a machine.  after
rebooting a machine, everything's the same since nothing's mounted.

> It appears that recovery isn't even getting to the fencing stage.  The
> fencing configuration looks fine.  Could you send the output of

> $ cman_tool status
Protocol version: 5.0.1
Config version: 1
Cluster name: blade_cluster
Cluster ID: 38068
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 1
Total_votes: 3
Quorum: 2
Active subsystems: 3
Node name: blade01
Node addresses: 128.50.18.1

> $ cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    1   M   blade01
   3    1    1   M   blade03
   4    1    1   M   blade04

> $ cman_tool services
Service          Name                              GID LID State     Code
Fence Domain:    "default"                           1   2 run       -
[3 1 4]

DLM Lock Space:  "clvmd"                             2   3 run       -
[3 1 4]

> run on all three nodes both before and after you reboot a node.
> 
> Dave
> 

-- 
email:  phung at cs.columbia.edu
www:    http://www.cs.columbia.edu/~phung
phone:  646-775-6090
office: CS Dept. 520, 1214 Amsterdam Ave., MC 0401, New York, NY 10027