[Linux-cluster] Halt nodes in cluster with cable disconnect

Digimer linux at alteeve.com
Fri Jan 27 20:31:48 UTC 2012


On 01/27/2012 03:20 PM, yvette hirth wrote:
> Digimer wrote:
> 
>> You can crash the machine with this;
>>
>> echo c > /proc/sysrq-trigger
> 
> will
> 
> ifconfig ethx down  (where "x" = heartbeat ethernet interface numbah)
> 
> do the same thing?
> 
> yvette

Nope. The scenario is caused by both nodes being alive, but losing the
ability to talk to one another on the storage channel. Whether it is
because a given cable is unplugged or a bad firewall rule, the result is
the same; Both nodes see a failure at the same time and call their fence
handlers at the same time. The one with the sleep will delay, and thus,
always lose (and be the fence victim).

The idea behind sending "c" to sysre-trigger is that it hangs the kernel
entirely. The hung node will no trigger it's fence, or do anything else
for that matter. Meanwhile, the node with the sleep will detect the
fault, call the agent, sleep for a few seconds, then proceed to fence
the hung node. This more accurately simulates an actual fault in the
primary node and confirms that the sleep'ed node will in fact fence
successfully.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Papers and Projects: https://alteeve.com




More information about the Linux-cluster mailing list