[Linux-cluster] fencing problem in 2 node cluster using apc fence device

Gary Lua garylua at singnet.com.sg
Wed Aug 16 14:37:49 UTC 2006


Hi,

I'm currently configuring fencing devices for my 2 nodes on a RHEL4 
cluster. The problem is quite long, so please bear with me.

I have 2 nodes (let's call them stone1 and stone2) and 2 APC fencing 
devices (pdu1 and pdu2, both apc 7952 devices). Both stone1 and stone2 
has dual power supplies. Stone1's power supplies are connected to outlet 
13 of pdu1 and pdu2. Stone2's power supplies are connected to outlet 20 
of both the pdus. My question is: during the fencing configuration for 
each node, i need to specify which fence device to add to the fence 
level of each node. Is it correct to specify for stone1 as follows : 
pdu1 -> port=13, switch=1, pdu2-> port=13, switch=2? The same applies to 
stone 2 : pdu1-> port=20, switch=1, pdu2-> port=20, switch=2?

After configuring as mentioned above, with both nodes on the cluster 
running and my application running on stone1, i pull out the ethernet 
cables for stone1 to simulate that the server is down. By right, my 
application should fail over to stone2 and fencing should occur to 
stone1 (ie, stone1 should be rebooted/shutdown). However, what happened 
is that my application is started on stone2, and stone1 is not fenced. 
In fact, when i reconnect by cables, my application is still running on 
stone1! Seems that there are 2 instances of my application running, each 
on stone1 and stone2.

Why has the fencing failed? I've read somewhere that acpid service plays 
a part and i need to disable it. Is it true? When I check my 
/var/log/messages, I see a cman :sendmsg failed -101 error. What does 
this mean?

I've been trying to solve this problem for the last few days, but to no 
avail. Any advice will be appreciated.




More information about the Linux-cluster mailing list