[Linux-cluster] Fencing Device Question

Brandon Young bkyoung at gmail.com
Tue Jun 3 17:55:57 UTC 2008


In my GFS cluster, I use DRAC cards as the fencing device for each node.
Yesterday, I had a situation where the DRAC card on a particular node had
failed, and would not allow remote logins, etc, but it still returned
pings.  I don't know how long the card had been dead, and I only noticed
because I wished to manually fence the node and fencing failed ... which
caused me all sorts of other fun to recover the cluster, afterwards.  So, I
have uncovered a pretty scary bad-case scenario for my cluster
configuration.

My question is what (if anything) can RHCS/GFS do to determine the
health/presence/operation of fencing devices?  If it can do something to
monitor the fencing devices, and discovers a bad fencing device, what will
it do?  For example, if I unplug the network cable for the heartbeat, the
node will get fenced immediately.  I never tested whether the same would
happen if I unplugged a fencing device.  I haven't delved into the
documentation in a while, but I don't remember anything about a way to have
redundant fencing devices, like a DRAC and a network power switch.  Is there
a way?

Thoughts, opinions, insight, documentation, etc would be greatly
appreciated.

--
Brandon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080603/536afcaf/attachment.htm>


More information about the Linux-cluster mailing list