[Linux-cluster] stop responding rgmanager

Tue Jan 20 11:42:48 UTC 2009

Ghe Rivero wrote:
> Hi everyone,
>     i've been fighting the last days with a 2-node cluster, but 
> finally i quit.
> I'm having problems with the clurgmgrd daemon. It stop responding  
> when i restart the cluster (just the cluster, not the services or the 
> nodes) and become unkillable. The only way to revert this situation 
> it's restarting the nodes but as you can imagine that's not a solution.
>
> I'm using conga to config it. Any ideas?
>
> Ghe Rivero
> <?xml version="1.0"?>
> <cluster alias="AAA" config_version="14" name="AAA">
>         <quorumd interval="3" label="quorumlnx" 
> status_file="/tmp/qdisk-status" tko="23" votes="1"/>
>         <cman deadnode_timeout="135" expected_nodes="3"/>
>         <fence_daemon clean_start="0" post_fail_delay="0" 
> post_join_delay="3"/>
>         <clusternodes>
>                 <clusternode name="node1.fqdn" nodeid="1" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="iLO-node1"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>                 <clusternode name="node2.fqdn" nodeid="2" votes="1">
>                         <fence>
>                                 <method name="1">
>                                         <device name="iLO-node2"/>
>                                 </method>
>                         </fence>
>                 </clusternode>
>         </clusternodes>
>         <cman expected_votes="3" two_node="0"/>
>         <fencedevices>
>                 <fencedevice agent="fence_ilo" hostname="10.110.65.6" 
> login="login" name="iLO-node1" passwd="Y"/>
>                 <fencedevice agent="fence_ilo" hostname="10.110.65.7" 
> login="login" name="iLO-node2" passwd="Y"/>
>         </fencedevices>
>         <rm>
>                 <failoverdomains>
>                         <failoverdomain name="Web" ordered="1" 
> restricted="1">
>                                 <failoverdomainnode name="node1.fqdn" 
> priority="1"/>
>                                 <failoverdomainnode name="node2.fqdn" 
> priority="2"/>
>                         </failoverdomain>
>                 </failoverdomains>
>                 <resources>
>                         <script file="/etc/init.d/httpd" name="Apache"/>
>                         <ip address="10.110.65.30" monitor_link="1"/>
>                 </resources>
>                 <service autostart="1" domain="Web" exclusive="1" 
> name="Web">
>                         <script ref="Apache"/>
>                 </service>
>         </rm>
> </cluster>
>
>
>
>
>
> -- 
> .''`.  Pienso, Luego Incordio  
> : :' :  
> `. `'  
>  `-    www.debian.org <http://www.debian.org>    www.hispalinux.es 
> <http://www.hispalinux.es>
>
> GPG Key: 26F020F7
> GPG fingerprint: 4986 39DA D152 050B 4699  9A71 66DB 5A36 26F0 20F7
> ------------------------------------------------------------------------
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Your cluster.conf looks a little out of whack for a 2 node cluster.  It 
looks as if it's designed for a 3 node cluster, but you've only defined 
two nodes.  This will get you in to trouble (I know from experience) :-)

You've got duplicate cman entries which do not look right (although I'm 
pretty new to RHCS myself so I wouldn't consider me an authority on the 
matter).  See <cman deadnode_timeout="135" expected_nodes="3"/> and 
<cman expected_votes="3" two_node="0"/>.

I would have thought that should be in a combined cman directive such as 
<cman deadnode_timeout=135 expected_votes="2" two_node="1"/>.  The 
expected votes would be 2, because in the event of split brain you'll 
want 1 node + quorum disk to remain a Quorate Cluster.

In my cluster.conf <cman> is defined after the </clusternodes>.  I'm not 
sure if it makes a difference, but I would suggest removing the top most 
cman directive and merge it's parameters in to the bottom directive.

Also, do you need a quorum disk?  A two node cluster can have but does 
not need one to operate.

If you don't, expected_votes=1.

See how you go.

Regards,

Stewart