[Linux-cluster] explanation of how disaallowed nodes works
ccaulfie at redhat.com
Tue Mar 31 07:31:18 UTC 2009
Brett Cave wrote:
> On a 6 node cluster, 2 nodes (1 & 6) were fenced. On coming back up,
> the 2 nodes were not able to start the cman service.
> All the other nodes have activity blocked. Disallowed nodes are (from
> cman_tool status)
> node2: 3,4,5
> node3: 2,4,5
> node4: 2,3,5
> node5: 2,3,4
> node1 & node6 - cman not running.
> Am using qdisk, and all running nodes have the disallowed list flagged
> as "d" - disallowed.
> Each node then also has:
> X (not a cluster member) for qdisk and the 2 fenced nodes that cman
> will not start on.
> d (on the 3 running nodes other than current)
> M (on the self-node - i.e. if run on node2, then node2 = M)
> This is what I get in logs when I try start cman on 1 of the X nodes...
> openais: CMAN: Joined a cluster with disallowed nodes. must die
> I cant get the nodes to restart cman - "service gfs stop" to unmount
> gfs mounts hangs... the following process is not able to complete:
> /sbin/umount.gfs /my/mountpoint1
> Is there a way to get the cluster to recover from this? Going to be
> fencing all the nodes now to get the system up.
The cman_tool man page has some detail on disallowed mode. But also
check the version. cman in RHEL5.3 has a bug that can cause this to
happen. I believe a hot fix is in the works somewhere...
More information about the Linux-cluster