[Linux-cluster] qdisk WITHOUT fencing

Mon Jun 21 09:20:34 UTC 2010

On 06/21/2010 08:52 AM, Kaloyan Kovachev wrote:
> On Fri, 18 Jun 2010 18:15:09 +0200, brem belguebli
> <brem.belguebli at gmail.com>  wrote:
>> How do you deal with fencing when the intersite interconnects (SAN and
>> LAN) are the cause of the failure ?
>>
>
> GPRS or the good old modem over a phone line?

That isn't going to work if the whole site is down for whatever reason 
(unlikely as it may be).

To protect yourself from the 100% outage of a remote site, the only sane 
way I of approaching it I can think of is to do something like the 
following:

1) Make each node fence itself off from the failed node using iptables 
or some other firewalling method. The SAN should also be prevented from 
allowing the booted out node back onto it.

2) Fail over the IP address or DNS name of the service. Since it's 
across different sites, you are likely to have to use something like RIP 
to re-route the IPs, so DNS on short refresh may well be an easier and 
possibly safer option. It'll mean some downtime, but probably less than 
any manual intervention in an unplanned case.

It's not entirely ideal, bit it's about as good as it is likely to get. 
And you can write a fencing agent to do something like this easily enough.

Gordan