[Linux-cluster] two fencing problems

Greg Forte gforte at leopard.us.udel.edu
Wed Dec 7 19:32:37 UTC 2005


Eric Kerin wrote:
 > Greg,
 >
 > I'm using the fence_apc agent on my cluster with APC 7900s, and fencing
 > is working perfectly for me, and has for more than 6 months now.

Thanks, Eric, but the fence_apc script is definitely not the issue - I 
had to make a couple of minor changes to fence_apc's regexps, and it now 
works both with command-line options and passing arguments through 
stdin.  This doesn't explain why the cluster conf doesn't work when it 
has "off" and then "on" as set up by system-config-cluster (and it did 
that itself, all I did was configure the ip address and login for the 
fence devices, and tell it which ports to use), but it does work when I 
make the change to 'reboot' as described in my previous message (this is 
the default option, anyway, which I assume is why yours works with no 
"option=" option).

> You can test that the cluster is configured correctly to fence a node by
> running "fence_node <nodename>"  This will use the cluster's config file
> to fence the node, ensuring that all config settings are correct.

Actually, that doesn't seem to work for me - no matter what nodename I 
specify, and regardless of whether I run it on the node I'm trying to 
fence or the other node (it's a two-node cluster), it comes back with 
"Fence of 'hostname' was unsuccessful."  I suspect this is because it's 
a two-node cluster so fenced doesn't want to let me kick out a node 
that's still active ... or maybe it's a just host name problem. 
Regardless, it _does_ work correctly if I simulate a real failure, after 
I made the aforementioned cluster.conf change, so I'm confident that 
I've got it configured correctly.  My gripe is that (a) the gui tool 
can't seem to generate even the most simple conf correctly, and (b) 
there's apparently a bug in fenced where it passes an "option=on" to the 
fence_apc agent, when it clearly should be "option = off".  Or else ccsd 
is misparsing the cluster.conf file.  I don't see how else to explain 
that the conf file said "off", then "on", but the daemon did "on", "on".

> When updating the cluster.conf file by hand, you are updating the
> config_version attribute of the cluster node, right?  I do updates to my
> cluster.conf file by hand pretty much exclusively, while the cluster is
> running, and with no problems whatsoever.  Changes propagate as expected
> after running "ccs_tool update <cluster.conf filename>"and "cman_tool
> version -r <new_version_number>"

Hmmm ... nope, but I will do so in the future.  ;-)  Thanks.

-g

Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE




More information about the Linux-cluster mailing list