[Linux-cluster] Cman doesn't realize the failed node
Hakan VELIOGLU
veliogluh at itu.edu.tr
Fri Nov 14 14:31:49 UTC 2008
Hi,
I solved my problem. When the kernel IP forwarding feature
(/proc/sys/net/ipv4/ip_forward) is 0, then cluster nodes don't realize
the failure. I write this solution to help others.
However, I am curious about that is all of your RedHat 5 OS default
ip_forward settting is enabled? Are all your failover clusters working
as expected ?
Have a nice day list.
PS: This change is included just in Red Hat 4 Cluster Suite
documentation not in Red Hat 5 cluster suite. Interesting!!!
----- veliogluh at itu.edu.tr den ileti ---------
Tarih: Wed, 12 Nov 2008 13:17:00 +0200
Kimden: Hakan VELIOGLU <veliogluh at itu.edu.tr>
Yanıt Adresi:linux clustering <linux-cluster at redhat.com>
Konu: [Linux-cluster] Cman doesn't realize the failed node
Kime: linux clustering <linux-cluster at redhat.com>
> Hi,
>
> I am testing and trying to understand the cluster environment. I ve
> built a two node cluster system without any service (Red Hat EL 5.2
> x64). I run the cman and rgmanager services succesfully and then
> poweroff one node suddenly. After thsi I excpect that the other node
> realize this failure and take up all the resources however running
> node doesn't realize this failure. I use "cman_tool nodes" and
> "clustat" commands and they say the failed node is active and
> online. What am i missing? Why cman doesn't realize the failure?
>
> [root at cl1 ~]# cat /etc/cluster/cluster.conf
> <?xml version="1.0" ?>
> <cluster alias="kume" config_version="54" name="kume">
> <totem token="1000" hold="100"/>
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="cl2.cc.itu.edu.tr" nodeid="1" votes="1">
> <fence/>
> </clusternode>
> <clusternode name="cl1.cc.itu.edu.tr" nodeid="2" votes="1">
> <fence/>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices/>
> <rm>
> <failoverdomains>
> <failoverdomain name="domain" ordered="1"
> restricted="1">
> <failoverdomainnode
> name="cl2.cc.itu.edu.tr" priority="1"/>
> <failoverdomainnode
> name="cl1.cc.itu.edu.tr" priority="2"/>
> </failoverdomain>
> </failoverdomains>
> <resources/>
> <service autostart="0" domain="domain"
> name="veritabani" recovery="restart"/>
> </rm>
> </cluster>
> [root at cl1 ~]#
>
>
> When the node gows down, the TOTEM repeastedly logs messages like this.
> Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:12:57 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:03 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:09 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
> Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] The consensus timeout expired.
> Nov 12 13:13:14 cl1 openais[5809]: [TOTEM] entering GATHER state from 3.
>
>
>
> Hakan
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
----- veliogluh at itu.edu.tr den iletiyi bitir -----
More information about the Linux-cluster
mailing list