[Linux-cluster] Starter Cluster / GFS

Gordan Bobic gordan at bobich.net
Thu Nov 11 09:27:41 UTC 2010


Digimer wrote:
> On 10-11-10 10:29 PM, Jankowski, Chris wrote:
>> Digimer,
>>
>> 1.
>> Digimer wrote:
>>>>> Both partitions will try to fence the other, but the slower will lose and get fenced before it can fence.
>> Well, this is certainly not my experience in dealing with modern rack mounted or blade servers where you use iLO (on HP) or DRAC (on Dell).
>>
>> What actually happens in two node clusters is that both servers issue the fence request to the iLO or DRAC. It gets processed and *both* servers get powered off.  Ouch!!  Your 100% HA cluster becomes 100% dead cluster.
> 
> That is somewhat frightening. My experience is limited to stock IPMI and
> Node Assassin. I've not seen a situation where both die. I'd strongly
> suggest that a bug be filed.

It's actually fairly predictable and quite common. If the nodes lose 
connectivity to each other but both are actually alive (e.g. cluster 
service switch failure), you will get this sort of a shoot-out. The 
cause is that most out-of-band power-off mechanisms have an inherent lag 
of several seconds (i.e. it can be a few seconds between when you issue 
a power-off command and the machine actually powers off). During that 
race window, both machines may issue a remote power-off before they 
actually shut down themselves.

Gordan




More information about the Linux-cluster mailing list