[Linux-cluster] Cluster Fence dont work on all nodes

Bruno Frensch Deschamps bruno at redix.com.br
Thu Aug 21 13:46:49 UTC 2008


Hi

I have testing my cluster nodes on IBM Blade Center, but i have a 
problem, the fence does not work correctly.
I have 2 nodes:
node1 : 10.0.20.34
node2 : 10.0.20.35

When i fence it manually with the command fence_node node1 and 
fence_node node2 its woks correctly.
When a service is running on node2, and i disconnect node1 form the 
network(to force the fence) it works correctly too.
My problem is when is running a service on node1, and i disconnect node2 
from the network, is does not fence the machine.

Here is the logs of the servers, then you can see the fence working, On 
node2 you can note that fence return success. But not on node1.

Have you ever experienced this kind of a problem?
Have any suggestions on what i have to do?


I run fence_tool dump on the node that dont fence, and show this message:

1219238621 stop default
1219238621 start default 4 members 1
1219238621 do_recovery stop 1 start 4 finish 1
1219238621 add node 2 to list 1
1219238621 averting fence of node 10.0.20.35
1219238621 finish default 4
1219238681 client 4: dump


Someone  know why he show the message "averting fence of node" and dont 
fence the node?

Thanks for the help.



node1:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering GATHER state from 0.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Creating commit token 
because I am the rep.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Saving state aru 57 high 
seq received 57
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Storing new sequence id for 
ring 1edb5c
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering COMMIT state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering RECOVERY state.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] position [0] member 
10.0.20.34:
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] previous ring seq 2022232 
rep 10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] aru 57 high delivered 57 
received flag 1
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Did not need to originate 
any messages in recovery.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] Sending initial ORF token
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 kernel: dlm: closing connection to node 2
Aug 18 11:11:46 node1 fenced: 10.0.20.35 not a cluster member after 0 
sec post_fail_delay
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34) 
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.35) 
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] New Configuration:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ]    r(0) ip(10.0.20.34) 
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Left:
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] Members Joined:
Aug 18 11:11:46 node1 openais[3515]: [SYNC ] This node is within the 
primary component and will provide service.
Aug 18 11:11:46 node1 openais[3515]: [TOTEM] entering OPERATIONAL state.
Aug 18 11:11:46 node1 openais[3515]: [CLM  ] got nodejoin message 
10.0.20.34
Aug 18 11:11:46 node1 openais[3515]: [CPG  ] got joinlist message from 
node 1



Node2:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering GATHER state from 0.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Creating commit token 
because I am the rep.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Saving state aru 53 high 
seq received 53
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Storing new sequence id for 
ring 1edb7c
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering COMMIT state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] entering RECOVERY state.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] position [0] member 
10.0.20.35:
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] previous ring seq 2022264 
rep 10.0.20.34
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] aru 53 high delivered 53 
received flag 1
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Did not need to originate 
any messages in recovery.
Aug 18 15:55:52 node2 openais[5232]: [TOTEM] Sending initial ORF token
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:52 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:52 node2 kernel: dlm: closing connection to node 1
Aug 18 15:55:52 node2 fenced[5248]: 10.0.20.34 not a cluster member 
after 0 sec post_fail_delay
Aug 18 15:55:52 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35) 
Aug 18 15:55:52 node2 fenced[5248]: fencing node "10.0.20.34"
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.34) 
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] CLM CONFIGURATION CHANGE
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] New Configuration:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ]    r(0) ip(10.0.20.35) 
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Left:
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] Members Joined:
Aug 18 15:55:53 node2 openais[5232]: [SYNC ] This node is within the 
primary component and will provide service.
Aug 18 15:55:53 node2 openais[5232]: [TOTEM] entering OPERATIONAL state.
Aug 18 15:55:53 node2 openais[5232]: [CLM  ] got nodejoin message 
10.0.20.35
Aug 18 15:55:53 node2 openais[5232]: [CPG  ] got joinlist message from 
node 2
Aug 18 15:55:59 node2 fenced[5248]: fence "10.0.20.34" success
Aug 18 15:56:00 node2 clurgmgrd[5507]: <notice> Taking over service 
service:FirewallClusta from down member 10.0.20.34

-- 
Bruno F. Deschamps - Consultor
Profissional Certificado LPIC-1
--------------------------------------------------------------------
Redix - Gestão em T.I. com Software Livre
http://www.redix.com.br - redix at redix.com.br
Tel. Coml.: +55 (47) 3323-7313
--------------------------------------------------------------------




More information about the Linux-cluster mailing list