[Linux-cluster] pull plug on node, service never relocates

Corey Kovacs corey.kovacs at gmail.com
Sat May 15 18:25:24 UTC 2010


The reason I was pointing at the fencing config is that the service
will only re-locate when fenced is able to confirm that the offending
node has been fenced. If this can't happen, then services will not
relocate since the cluster doesn't know the state of all the nodes. If
a node get's an anvil dropped on it, then it should stop responding
and the cluster should then try to invoke the fence on that node to
make sure that it is indeed dead, even if it only cycles the power
port for n already dead node.

Given you description you should experience the same "problem" if you
simply turn the node off. Nomally, when you turn the power off (not
pull the plug) then boot the node, the cluster either should have
aleady fenced the node, or it will fence as it's booting. Looks odd
but it's correct since the cluster has to get things to a known state.

After the fence and before the node boots, services should start
migrating. All of this you probably know but it's worth saying anwyay.

Basically, if your services only migrate after the node boots up, then
I believe fencing is not working properly. The services should migrate
while the node is booting or even before.

So it appears to me that when you power the apc yourself, or pull the
plug on the node, you have the same condition.

The way to really testing fencing, is to watch the logs on a node and issue

cman_tool kill <cluster memner> and tell cman to fence the node.

One thought, can all your cluster nodes talk the APC at all times?


-Corey




On Sat, May 15, 2010 at 5:50 PM, Dusty <dhoffutt at gmail.com> wrote:
> Fencing works great - no problems there. The APC PDU responds beautifully to
> any node's attempt to fence.
>
> The issue is this:
>
> The service only relocates after the fenced node reboots and rejoins the
> cluster. Then the service relocates to another node. This happens well and
> without fail.
>
> But what if the node that was fenced refuses to boot back up because, say an
> anvil fell out of the sky and smashed it, or its motherboard fried?
>
> This is what I am simulating by pulling the plug on a node that happens to
> be running a service. The service will not relocate until the failed node
> has rebooted.
>
> I don't want that. I want the service to relocate ASAP regardless of if the
> failed node reboots or not.
>
> Thank you so much for your consideration.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>




More information about the Linux-cluster mailing list