[Linux-cluster] Starter Cluster / GFS
linux at alteeve.com
Thu Nov 11 16:44:14 UTC 2010
On 10-11-11 04:23 AM, Gordan Bobic wrote:
> Jankowski, Chris wrote:
>> Digimer wrote:
>>>>> Both partitions will try to fence the other, but the slower
>>>>> will lose and get fenced before it can fence.
>> Well, this is certainly not my experience in dealing with modern
>> rack mounted or blade servers where you use iLO (on HP) or DRAC (on
>> What actually happens in two node clusters is that both servers
>> issue the fence request to the iLO or DRAC. It gets processed
>> and *both* servers get powered off. Ouch!! Your 100% HA cluster
>> becomes 100% dead cluster.
> Indeed, I've seen this, too, on a range of hardware. My quick and dirty
> solution was to doctor the fencing agent to add a different sleep() on
> each node, in order of survivor preference. There may be a setting in
> cluster.conf that can be used to achieve the same effect, can't remember
> off the top of my head.
I've not seen such an option, though I make no claims to complete
knowledge of the options available. I do know that there are pre-device
fence options (that is, IPMI has a set of options that differs from
DRAC, etc). So perhaps there is an option there.
I am very curious to know how this scenario can happen. As I had
previously understood it, this should simply not be possible. Obviously
it is though... The only thing I can think of is where a fence device is
external to the nodes and allows for multiple fence calls at the same
time. I would expect that and fence device should terminate a node
nearly instantly. If it doesn't or can't, then I would suggest that it
not accept a second fence request until after the pending one completes.
E-Mail: digimer at alteeve.com
Node Assassin: http://nodeassassin.org
More information about the Linux-cluster