[Linux-cluster] Cluster Shutdown - ideas?
ccaulfie at redhat.com
Tue Aug 12 10:50:33 UTC 2008
One thing that cman does rather badly is a full cluster shutdown. With
the RHEL4 code you would shut each node down in turn using the init
scripts and found that everything hung as it lost quorum when the N/2th
node went down.
With RHEL5 the init script was changed to do a "cman_tool leave remove"
which tells the remaining nodes to reduce quorum to allow for the
I don't really like either of these solutions. The RHEL4 way is
obviously a nuisance, but even the RHEL5 system is wrong IMHO. A normal
node shutdown should not reduce quorum. If other nodes fail while that
node is down the cluster runs the risk of a split brain due to reduced
Those of you who have worked with VMS systems know that that OS has a
CLUSTER_SHUTDOWN option which causes the cluster software to wait until
all nodes have reached a shutdown barrier and then allows all of them to
go down at the same time. We could do this with Linux, but I'm not
really sure how much use it would be, mainly because the cluster
software sits at a higher level in the OS than with VMS and there is a
lot more for the computer to do after the cluster software has shut
down. It is an option though.
The other option is simply to set a flag (either in CMAN or locally) to
tell the node or the whole cluster that everyone is being shut down.
There are a few ways of doing this, the simplest is to add a flag to the
cman init script (basically the opposite of what happens now in RHEL5)
that causes "cman_tool leave remove". But that requires the cluster
software to be shut down independently of the rest of the software thus
destroying the point of ordered init scripts.
So, the flag could be an environment variable that is checked by the
init script perhaps (do those get passed through?), or perhaps a flag
inside cman itself that changes the "leave" behaviour to either do a
"leave remove" or the synchronised cluster shutdown I mentioned earlier.
Does anyone have any preferences, ideas or other options we might consider?
More information about the Linux-cluster