[Linux-cluster] restarting cluster from one node to multiple (Was: Re: rhel6 node start causes power on of the other one)

Fabio M. Di Nitto fdinitto at redhat.com
Tue Mar 22 18:57:17 UTC 2011

On 03/22/2011 04:41 PM, bergman at merctech.com wrote:
> The pithy ruminations from "Fabio M. Di Nitto" <fdinitto at redhat.com> on "Re: [Linux-cluster] rhel6 node start causes power on of the other one" were:
> => Hi,
> => 
> => On 3/22/2011 11:12 AM, Gianluca Cecchi wrote:
> 	[SNIP!]
> => > 
> => > If the initial situation is both nodes down and I start one of them, I
> => > get it powering on the other, that is not my intentional target...
> 	[SNIP!]
> => 
> => This is expected behavior.
> => 
> 	[SNIP!]
> => I am not sure why you want a one node cluster, but one easy workaround
> Sometimes, it's not a matter of "wanting" a one-node cluster, but
being forced to have one temporarily. For example, if there's a hardware
failure in one node of a 2-node cluster. I think that a likely scenario
is that there's an event (for example, a power outage) that shuts down
all nodes in a cluster, and that there is subsequent damage from that
event (hardware failure, filesystem corruption on the local storage,
etc.) that prevents some nodes from being restarted.

If the hardware has failed or doesn't boot, fencing will still happen
from the remaining node, succeed (assuming the fencing device is not
gone bad too) and the node will keep working. The failed node at that
point does NOT need to rejoin the cluster for the surviving node to keep

the problem is that we need to differentiate between normal operations
and special situations.

In a special situation like you describe, you might have to go to the
server to do hw repair, just unplug the power cords from the fencing
device (assuming power fencing) and wait for the remaining node to fence
and keep working.

If fencing fails, then you can use fence_ack_manual to override the
"wait for fencing" condition on the surviving node and allow it to
operate (and make absolutely sure the bad node is really off for good or
bad things will happen).

> => is to start both of them at the same time, and then shutdown one of
> If both nodes are not available, this is not an easy work-around.
> => them. At that point they have both seen each other and the one going
> => down will tell the other "I am going offline, no worries, it´s all good".
> => 
> What are the recommended alternative methods to starting a
> single-node
on a cluster? If the number of expected votes is set to the number of
votes for the single node, I'm able to start a single node. However, I'm
not sure what will happen if additional nodes in the cluster are started
later...will there be fencing or split-brain issues if "expected votes"
is "1" when there are 2 nodes in the cluster?

So this area is delicate. Adding nodes to a running cluster when number
of nodes is >= 2 is easy. Adding nodes from 1 to 3 is delicate.

In some random tests I did, but they are not officially supported
operations, i was able to start a one node cluster (with literally one
node in the config) and go up to 16.

Then, assuming you are on rhel6 (didn't test 5) and start with a one
node cluster that's up and running:

- create a config with higher version, and 2 nodes (you cannot bump from
1 to random number of nodes in one go or you will risk fencing/split
brain, there are some rules related to quorum that could cause issues if
not followed strictly).

- copy the new config on both nodes

- start cman on the node you want to add.

At this point, the new node will join with config in running_version+1,
triggering immediately a config reload on the active single node, that
will see the new node and recalculate quorum.

You should be able to repeat the same operation adding one node at a
time, in some cases more, but since it's complex and delicate
calculation, stick to one.

> Can additional nodes be brought up without affecting the services
running on the existing node (ie., without causing the new node to fence
the existing node)?

Yes, in theory, but this is not a scenario we test constantly or support
as full feature.

Clearly, all of the above is in the assumption that the configs are
correct at every stage and that there are no other problems in between
(for instance a network issue or iptables misconfigured etc).


More information about the Linux-cluster mailing list