[Linux-cluster] two fencing problems

Greg Forte gforte at leopard.us.udel.edu
Wed Dec 7 15:08:20 UTC 2005


Greg Forte wrote:

> And it still doesn't appear to work ... I can turn the outlets on and 
> off from the command line, but if I down the interface on a node, the 
> other node reports that it's removing the "failed" node from the 
> cluster, and that it's fencing the "failed" node, but the "failed" node 
> never gets shut down.  Does this get logged somewhere besides 
> /var/log/messages, or is there a way to force it to be more verbose?  If 
> I could see what command fenced is actually invoking that might help ...

Well, in case anyone is interested, I got fed up with having no decent 
logging from any of these components, so I finally used tcpdump to 
monitor the telnet connection between the non-failed node and the PDUs 
as it tried to fence them ... and it turns out that fence_apc was trying 
to turn each port ON twice, instead of OFF and then ON like it's 
supposed to according to my configuration.  The fault apparently lies 
somewhere in ccsd or fenced, because the fence_apc script definitely 
responds properly to the on|off|reboot options, both on the command line 
and in the stdin like fenced uses.

I changed my cluster.conf so that it uses 'reboot' instead of 'off' and 
'on' (e.g. the old conf looked like this:

                                         <device name="FENCE1" 
option="off" port="1"/>
                                         <device name="FENCE2" 
option="off" port="1"/>
                                         <device name="FENCE1" 
option="on" port="1"/>
                                         <device name="FENCE2" 
option="on" port="1"/>

and the new one looks like this:

                                         <device name="FENCE1" 
option="reboot" port="1"/>
                                         <device name="FENCE2" 
option="reboot" port="1"/>

and increased the reboot wait time on the PDUs to make sure it'd wait 
long enough, and that SEEMS to work (once I remembered to turn off ccsd 
before updating my cluster.conf by hand so that it didn't end up 
replacing it with the old one immediately ;-)

Of course, I can't bring up any of the per-node fencing configuration 
items in system-config-cluster anymore, but I think I mentioned that 
previously - when I set them up through the gui it put "switch=" options 
in each <device /> tag, and then when I shut down and restarted the gui 
it complained that the file was formatted improperly.  I removed those 
options by hand, and then the gui worked again, but ever since the 
fencing info hasn't been available ...

Any developers care to comment on any of this?  I'm finding it really 
tough to believe that this is a supported RedHat "product".

-g

Greg Forte
gforte at udel.edu
IT - User Services
University of Delaware
302-831-1982
Newark, DE




More information about the Linux-cluster mailing list