[Linux-cluster] question about what happens when fencing fails

David Monro david.monro at adelaide.edu.au
Fri Jun 20 05:26:31 UTC 2008


Hi,

just trying to get my head around the way CS5 copes with various failure
modes.

I have 2 sites, with one node and san at one site, and a second node in
the other site. (There is also a second san in the other site with a san
copy from the primary for disaster recovery). I am using HP iLO fencing.

The most likely failure scenario for us at the moment is complete loss
of the ethernet network between the 2 sites, with the SAN remaining up.
Obviously in this case both nodes will be unable to see the other, and
in addition will be unable to fence each other.

In the case where I do not use a quorum disk, what will happen? I would
have to guess that the answer will be a dead cluster, since neither node
can succeed at fencing the other.

In the case where I do use a quorum disk, what will happen? Both will
still have access to the quorum disk, but neither can fence the other.
Assuming both still achieve their minimum score, they will presumably
have some sort of fight over the quorum disk - how does that get
resolved, and how can the winning node be sure it has won? (I think I
managed to provoke this scenario by accident when messing around the
other day, and one of the nodes started spitting lots of messages out
about the other node being undead - I'm not sure if the cluster was
quorate at the time or not).

I did look at other fencing options as well, but I can't use fence_scsi
(because we use dm_multipath - a pity because its about the one thing
which actually should work for this scenario I think!), or fence_brocade
(because the node can't get to the ethernet port on the switch in the
other site).

Obviously careful selection of a heuristic may be possible to allow one
node to remove itself from the fight over the qdisk, in which case will
the cluster be OK even though the remaining node can't prove that the
one with the less-than-minimum score is actually dead?

Any guidance would be much appreciated.

Cheers,

	David




More information about the Linux-cluster mailing list