[Linux-cluster] Clusterbehaviour if one node is not reachable & fenceable any longer?
Nicolas Kukolja
nicolas.kukolja at gmail.com
Thu Jan 30 12:00:29 UTC 2014
Digimer <lists <at> alteeve.ca> writes:
> And this is the fundamental problem of stretch/geo-clusters.
>
> I am loath to recommend this, because it's soooo easy to screw it up in
> the heat of the moment, so please only ever do this after you are 100%
> sure the other node is dead;
>
> If you log into the 2 remaining nodes that are blocked (because of the
> inability to fence), you can type 'fence_ack_manual'. That will tell the
> cluster that you have manually confirmed the lost node is powered off.
>
> Again, USE THIS VERY CAREFULLY!
>
> It's tempting to make assumptions when you've got users and managers
> yelling at you to get services back up. So much so that Red Hat dropped
> 'fence_manual' entirely in RHEL 6 because it was too easy to blow things
> up. I can not stress it enough just how critical it is that you confirm
> that the remote location is truly off before doing this. If it's still
> on and you clear the fence action, then really bad things could happen
> when the link returns.
>
> digimer
Thanks a lot for your support and explanations... So I will try to explain
it to my stakeholders...
One little question is still in my mind:
If in a three nodes scenario one node is not reachable and fencable, but two
other nodes are still alive and able to communicate to each other, where is
the risc of a "split-brain" situation?
The "lost" third node will, if it is still running but not accessable from
the others, disable the service because it has no contact to any other
nodes, right?
So if two nodes are connected, isn't it guaranteed, that the third node is
no longer providing the service?
Kind regards,
Nicolas
More information about the Linux-cluster
mailing list