[Linux-cluster] Cluster Fence not working on all nodes

Bruno Deschamps deepjm at gmail.com
Thu Aug 21 13:43:48 UTC 2008


 Hi

I have testing my cluster nodes on IBM Blade Center, but i have a problem,
the fence does not work correctly.
I have 2 nodes:
node1 : 10.0.20.34
node2 : 10.0.20.35

When i fence it manually with the command fence_node node1 and fence_node
node2 its woks correctly.
When a service is running on node2, and i disconnect node1 form the
network(to force the fence) it works correctly too.
My problem is when is running a service on node1, and i disconnect node2
from the network, is does not fence the machine.

Here is the logs of the servers, then you can see the fence working, On
node2 you can note that fence return success. But not on node1.

Have you ever experienced this kind of a problem?
Have any suggestions on what i have to do?


I run fence_tool dump on the node that dont fence, and show this message:

1219238621 stop default
1219238621 start default 4 members 1
1219238621 do_recovery stop 1 start 4 finish 1
1219238621 add node 2 to list 1
1219238621 averting fence of node 10.0.20.35
1219238621 finish default 4
1219238681 client 4: dump


Someone  know why he show the message "averting fence of node" and dont
fence the node?

Thanks for the help.



node1:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering GATHER state from 0.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Creating commit token because I
am the rep.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Saving state aru 57 high seq
received 57
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Storing new sequence id for
ring 1edb5c
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering COMMIT state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering RECOVERY state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] position [0] member 10.0.20.34:

Aug 18 11:11:46 node1 openais[3515]: [TOTEM] previous ring seq 2022232 rep
10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] aru 57 high delivered 57
received flag 1
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Did not need to originate any
messages in recovery.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Sending initial ORF token
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 kernel: dlm: closing connection to node 2
*Aug 18 11:11:46 node1 fenced: 10.0.20.35 not a cluster member after 0 sec
post_fail_delay*
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34)
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.35)
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34)
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [SYNC ] This node is within the primary
component and will provide service.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering OPERATIONAL state.
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] got nodejoin message 10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CPG  ] got joinlist message from node
1



Node2:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering GATHER state from 0.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Creating commit token because I
am the rep.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Saving state aru 53 high seq
received 53
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Storing new sequence id for
ring 1edb7c
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering COMMIT state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering RECOVERY state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] position [0] member 10.0.20.35:

Aug 18 15:55:52 node2 openais[5232]: [TOTEM] previous ring seq 2022264 rep
10.0.20.34
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] aru 53 high delivered 53
received flag 1
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Did not need to originate any
messages in recovery.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Sending initial ORF token
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:52 node2 kernel: dlm: closing connection to node 1
*Aug 18 15:55:52 node2 fenced[5248]: 10.0.20.34 not a cluster member after 0
sec post_fail_delay*
Aug 18 15:55:52 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35)
Aug 18 15:55:52 node2 fenced[5248]: fencing node "10.0.20.34"
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.34)
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35)
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [SYNC ] This node is within the primary
component and will provide service.
Aug 18 15:55:53 node2 openais[5232]: [TOTEM] entering OPERATIONAL state.
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] got nodejoin message 10.0.20.35
Aug 18 15:55:53 node2 openais[5232]: [CPG  ] got joinlist message from node
2
*Aug 18 15:55:59 node2 fenced[5248]: fence "10.0.20.34" success*
Aug 18 15:56:00 node2 clurgmgrd[5507]: <notice> Taking over service
service:FirewallClusta from down member 10.0.20.34
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080821/ae1fc1cf/attachment.htm>


More information about the Linux-cluster mailing list