[Linux-cluster] stop responding rgmanager
Stewart Walters
stewart at epits.com.au
Tue Jan 20 11:42:48 UTC 2009
Ghe Rivero wrote:
> Hi everyone,
> i've been fighting the last days with a 2-node cluster, but
> finally i quit.
> I'm having problems with the clurgmgrd daemon. It stop responding
> when i restart the cluster (just the cluster, not the services or the
> nodes) and become unkillable. The only way to revert this situation
> it's restarting the nodes but as you can imagine that's not a solution.
>
> I'm using conga to config it. Any ideas?
>
> Ghe Rivero
> <?xml version="1.0"?>
> <cluster alias="AAA" config_version="14" name="AAA">
> <quorumd interval="3" label="quorumlnx"
> status_file="/tmp/qdisk-status" tko="23" votes="1"/>
> <cman deadnode_timeout="135" expected_nodes="3"/>
> <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="3"/>
> <clusternodes>
> <clusternode name="node1.fqdn" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device name="iLO-node1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node2.fqdn" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device name="iLO-node2"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="3" two_node="0"/>
> <fencedevices>
> <fencedevice agent="fence_ilo" hostname="10.110.65.6"
> login="login" name="iLO-node1" passwd="Y"/>
> <fencedevice agent="fence_ilo" hostname="10.110.65.7"
> login="login" name="iLO-node2" passwd="Y"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="Web" ordered="1"
> restricted="1">
> <failoverdomainnode name="node1.fqdn"
> priority="1"/>
> <failoverdomainnode name="node2.fqdn"
> priority="2"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <script file="/etc/init.d/httpd" name="Apache"/>
> <ip address="10.110.65.30" monitor_link="1"/>
> </resources>
> <service autostart="1" domain="Web" exclusive="1"
> name="Web">
> <script ref="Apache"/>
> </service>
> </rm>
> </cluster>
>
>
>
>
>
> --
> .''`. Pienso, Luego Incordio
> : :' :
> `. `'
> `- www.debian.org <http://www.debian.org> www.hispalinux.es
> <http://www.hispalinux.es>
>
> GPG Key: 26F020F7
> GPG fingerprint: 4986 39DA D152 050B 4699 9A71 66DB 5A36 26F0 20F7
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
Your cluster.conf looks a little out of whack for a 2 node cluster. It
looks as if it's designed for a 3 node cluster, but you've only defined
two nodes. This will get you in to trouble (I know from experience) :-)
You've got duplicate cman entries which do not look right (although I'm
pretty new to RHCS myself so I wouldn't consider me an authority on the
matter). See <cman deadnode_timeout="135" expected_nodes="3"/> and
<cman expected_votes="3" two_node="0"/>.
I would have thought that should be in a combined cman directive such as
<cman deadnode_timeout=135 expected_votes="2" two_node="1"/>. The
expected votes would be 2, because in the event of split brain you'll
want 1 node + quorum disk to remain a Quorate Cluster.
In my cluster.conf <cman> is defined after the </clusternodes>. I'm not
sure if it makes a difference, but I would suggest removing the top most
cman directive and merge it's parameters in to the bottom directive.
Also, do you need a quorum disk? A two node cluster can have but does
not need one to operate.
If you don't, expected_votes=1.
See how you go.
Regards,
Stewart
More information about the Linux-cluster
mailing list