[Linux-cluster] Cluster Shutdown - ideas?

Tue Aug 12 12:05:11 UTC 2008

Hi,

On Tue, 2008-08-12 at 11:50 +0100, Christine Caulfield wrote:
> One thing that cman does rather badly is a full cluster shutdown. With 
> the RHEL4 code you would shut each node down in turn using the init 
> scripts and found that everything hung as it lost quorum when the N/2th 
> node went down.
> 
> With RHEL5 the init script was changed to do a "cman_tool leave remove" 
> which tells the remaining nodes to reduce quorum to allow for the 
> missing node(s).
> 
> I don't really like either of these solutions. The RHEL4 way is 
> obviously a nuisance, but even the RHEL5 system is wrong IMHO. A normal 
> node shutdown should not reduce quorum. If other nodes fail while that 
> node is down the cluster runs the risk of a split brain due to reduced 
> quorum.
> 
> Those of you who have worked with VMS systems know that that OS has a 
> CLUSTER_SHUTDOWN option which causes the cluster software to wait until 
> all nodes have reached a shutdown barrier and then allows all of them to 
> go down at the same time. We could do this with Linux, but I'm not 
> really sure how much use it would be, mainly because the cluster 
> software sits at a higher level in the OS than with VMS and there is a 
> lot more for the computer to do after the cluster software has shut 
> down. It is an option though.
> 
> The other option is simply to set a flag (either in CMAN or locally) to 
> tell the node or the whole cluster that everyone is being shut down. 
> There are a few ways of doing this, the simplest is to add a flag to the 
> cman init script (basically the opposite of what happens now in RHEL5) 
> that causes "cman_tool leave remove". But that requires the cluster 
> software to be shut down independently of the rest of the software thus 
> destroying the point of ordered init scripts.
> 
> So, the flag could be an environment variable that is checked by the 
> init script perhaps (do those get passed through?), or perhaps a flag 
> inside cman itself that changes the "leave" behaviour to either do a 
> "leave remove" or the synchronised cluster shutdown I mentioned earlier.
> 
> Does anyone have any preferences, ideas or other options we might consider?
> 
> Chrissie
> 
I think this is part of a larger problem that we have. Currently GFS2
has a shutdown issue where filesystems which were mounted by means other
than the init scripts cause hangs at node shutdown time. This is due to
the ordering of the kernel's shutdown scripts (kill off all userland
processes, then umount filesystems). There are pending bzs relating to
this issue, #435906 and #207697

I have to say that I still like the idea of a cluster run-level. Upon
reaching the "normal" multiuser run level, a process would run to join
the cluster and when quorum was reached, we'd go into a special cluster
run level, and drop out of it again when quorum is lost. I think that
would be similar to what VMS used to do from your description. When the
cluster lost quorum, then the remaining nodes would drop back into the
"normal" run level.

Steve.

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster