[Linux-cluster] Clusterbehaviour if one node is not reachable & fenceable any longer?

Digimer lists at alteeve.ca
Wed Jan 29 17:46:44 UTC 2014


On 29/01/14 12:42 PM, Nicolas Kukolja wrote:
> Digimer <lists <at> alteeve.ca> writes:
>
>>
>> 99% of the time, I agree totally. Logs and configs are super helpful. In
>> this case though, I am pretty sure I know exactly what's happening. :)
>>
>> digimer
>
> Thanks for the explanation, digimer. You got exactly what I mean an what
> happens.  Unfortunately, that was, what I was afraid of...
>
> The three nodes in my scenario are located about 200km from each other.
> If one of the nodes with all infrastructure around it (PDUs, Switches,
> IPMI...) is not reachable any longer because of a power outage or a full
> network outage at this location, switching a PDU is not possible, too...
>
> That would mean, that in this (very probably) case, the cluster will not
> help me?
>
> Do you have any suggestions, what I can do to workaround this case?
>
> Kind regards,
> Nicolas

And this is the fundamental problem of stretch/geo-clusters.

I am loath to recommend this, because it's soooo easy to screw it up in 
the heat of the moment, so please only ever do this after you are 100% 
sure the other node is dead;

If you log into the 2 remaining nodes that are blocked (because of the 
inability to fence), you can type 'fence_ack_manual'. That will tell the 
cluster that you have manually confirmed the lost node is powered off.

Again, USE THIS VERY CAREFULLY!

It's tempting to make assumptions when you've got users and managers 
yelling at you to get services back up. So much so that Red Hat dropped 
'fence_manual' entirely in RHEL 6 because it was too easy to blow things 
up. I can not stress it enough just how critical it is that you confirm 
that the remote location is truly off before doing this. If it's still 
on and you clear the fence action, then really bad things could happen 
when the link returns.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list