[Linux-cluster] Starter Cluster / GFS

Digimer linux at alteeve.com
Thu Nov 11 16:44:14 UTC 2010


On 10-11-11 04:23 AM, Gordan Bobic wrote:
> Jankowski, Chris wrote:
>> Digimer,
>>
>> 1.
>> Digimer wrote:
>>>>> Both partitions will try to fence the other, but the slower
>>>>> will lose and get fenced before it can fence.
>>
>> Well, this is certainly not my experience in dealing with modern
>> rack mounted or blade servers where you use iLO (on HP) or DRAC (on
>> Dell).
>>
>> What actually happens in two node clusters is that both servers
>> issue the fence request to the iLO or DRAC. It gets processed
>> and *both* servers get powered off.  Ouch!!  Your 100% HA cluster
>> becomes 100% dead cluster.
> 
> Indeed, I've seen this, too, on a range of hardware. My quick and dirty
> solution was to doctor the fencing agent to add a different sleep() on
> each node, in order of survivor preference. There may be a setting in
> cluster.conf that can be used to achieve the same effect, can't remember
> off the top of my head.
> 
> Gordan

I've not seen such an option, though I make no claims to complete
knowledge of the options available. I do know that there are pre-device
fence options (that is, IPMI has a set of options that differs from
DRAC, etc). So perhaps there is an option there.

I am very curious to know how this scenario can happen. As I had
previously understood it, this should simply not be possible. Obviously
it is though... The only thing I can think of is where a fence device is
external to the nodes and allows for multiple fence calls at the same
time. I would expect that and fence device should terminate a node
nearly instantly. If it doesn't or can't, then I would suggest that it
not accept a second fence request until after the pending one completes.

-- 
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org




More information about the Linux-cluster mailing list