[Linux-cluster] Cman doesn't realize the failed node
Hakan VELIOGLU
veliogluh at itu.edu.tr
Wed Nov 12 11:17:00 UTC 2008
Hi,
I am testing and trying to understand the cluster environment. I ve
built a two node cluster system without any service (Red Hat EL 5.2
x64). I run the cman and rgmanager services succesfully and then
poweroff one node suddenly. After thsi I excpect that the other node
realize this failure and take up all the resources however running
node doesn't realize this failure. I use "cman_tool nodes" and
"clustat" commands and they say the failed node is active and online.
What am i missing? Why cman doesn't realize the failure?
[root at cl1 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0" ?>
<cluster alias="kume" config_version="54" name="kume">
<totem token="1000" hold="100"/>
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices/>
<rm>
<failoverdomains>
<failoverdomain name="domain" ordered="1"
restricted="1">
<failoverdomainnode
name="cl2.cc.itu.edu.tr" priority="1"/>
<failoverdomainnode
name="cl1.cc.itu.edu.tr" priority="2"/>
</failoverdomain>
</failoverdomains>
<resources/>
<service autostart="0" domain="domain"
name="veritabani" recovery="restart"/>
</rm>
</cluster>
[root at cl1 ~]#
When the node gows down, the TOTEM repeastedly logs messages like this.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
Hakan
More information about the Linux-cluster
mailing list