[Linux-cluster] Fencing: Prevent rebooting halted node

Digimer lists at alteeve.ca
Wed Mar 21 13:18:12 UTC 2012


On 03/21/2012 06:43 AM, Nicolas Ecarnot wrote:
> Hi,
> 
> We are setting up a new cluster and we still have tests and questions.
> At present, our cluster is two nodes only, with a very simple setup.
> fencing is done with fence_ipmilan, and the only action we do is rebooting.
> Today, I tried to completely switch both nodes off, then boot up node 1.
> It perfectly boots up and serves as it should.
> But detecting the missing one, fencing is ran on node 2 and boots it up.
> 
> I would like to avoid that, and keep the stopped nodes stopped.
> 
> I don't know if there's a way I could improve my cluster.conf to do that?
> Either improve my fencedevice command, but I did not find many more
> option in the fence_ipmilan man page...
> Either there's a way to first do a test (?) before doing any further
> action?
> 
> I'd be glad to read your advice.

On first startup (whether started manually or via init.d), the node does
not know the state of it's peer. As such, it can't safely start services
until it fences the peer. This is by design.

If it just assumed it's peer was down and started services, it could
well cause a split-brain. The only way to avoid this is by putting the
peer into a known state, which fencing does (ensures that it's not
running and thus safe to proceed).

Once option is to change the fence action from "reboot" to "off". This
would avoid booting the peer while still allowing the running machine to
proceed safely. Note though that, in the event of a real crash of a
node, the cluster will only power off the node. You will have to
manually restart the fenced node.

-- 
Alteeve's Niche!
Madison Kelly        647-501-5200
Papers and Projects: https://alteeve.com




More information about the Linux-cluster mailing list