[Linux-cluster] Halt nodes in cluster with cable disconnect

Digimer linux at alteeve.com
Fri Jan 27 17:07:46 UTC 2012


On 01/27/2012 11:51 AM, Miguel Angel Guerrero wrote:
> Hi Digimer and Emmanuel
> 
> I was trying some tests with my cluster configuration and, in short:
> 
> 1. I think something's wrong with my configuration, because when a
> real desconnection (i.e. unplug the cable) happens on the node which
> does not have the sleep in the script (node A), the other node (node
> B) is always stonith'ed, when obviously the node which should reboot
> is the node A. This important to me because I want to know how the
> cluster should behave when a fail over the switch port or the NIC
> occurs.

A broken link is a broken link. The cluster has no idea whose cable has
been unplugged, only that they can no longer talk to one another. So the
same node being fenced is expected.

If you want to test an actual failure of the node to confirm that the
node with the sleep will win, hang the nodeA machine.

You can crash the machine with this;

echo c > /proc/sysrq-trigger

NodeB will lose contact with NodeA and call it's fence, sleep and then
finish the fence call. NodeA will be completely hung, so it won't even
try to fence and will stay hung until fenced by nodeB.

> 2.  @Emmanuel, could you point me to redhat's documentation about
> this? I tried your solution as this:
> 
> <fence_daemon clean_start="0" post_fail_delay="10" post_join_delay="30"/>
> 
> But still failed, tthere is another way?
> 
> 3. Another solution in this thread is to add a quorum disk to the
> cluster. I began to make this with this manual
> http://www.skau.dk/index.php?option=com_content&view=article&id=34:rhcs-cluster-using-iscsi&catid=4:cases-to-explain&Itemid=3
> 
> But I need to replicate the data using only two nodes, and it seems
> that this solution requires three. Could somebody tell me if I'm doing
> it fine/wrong? This causes conflicts when using DRBD?

Using qdisk on DRBD is a bad idea. Consider a split-brain scenario, the
qdisk could effectively duplicate, completely rendering it's purpose void.

-- 
Digimer
E-Mail:              digimer at alteeve.com
Papers and Projects: https://alteeve.com




More information about the Linux-cluster mailing list