[Linux-cluster] RHSCv4 2-node cluster hangs while starting fenced

Wed Nov 2 13:30:01 UTC 2005

Vlad, you are misunderstanding what he is trying to use the iLO card for.

In this setup is is being used as a fencing device, not an network card.
(As you say, it is indeed not a network card)
In other word, the iLO card in node A is used by other nodes in the
cluster to fence (in this case forcefully power off) node A, the
iLO card is _not_ used by node A for any ethernet traffic.

To quote from old an old mail on this topic:

Fencing is the act of forcefully preventing a node from being able to
access resources after that node has been evicted from the cluster
This is done in an attempt to avoid corruption.

The canonical example of when it is needed is the live-hang scenario, as
you described:

1. node A hangs with I/Os pending to a shared file system
2. node B and node C decide that node A is dead and recover resources
allocated on node A (including the shared file system)
3. node A resumes normal operation
4. node A completes I/Os to shared file system

At this point, the shared file system is probably corrupt. If you're
lucky, fsck will fix it -- if you're not, you'll need to restore from
backup. I/O fencing (STONITH, STOMITH, or whatever we want to call it)
prevents the last step (step 4) from happening.

How fencing is done (power cycling via external switch, SCSI
reservations, FC zoning, integrated methods like IPMI, iLO, manual
intervention, etc.) is unimportant - so long as whatever method is used
can guarantee that step 4 can not complete.

Please excuse my English
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20051102/eec46184/attachment.htm>