[Linux-cluster] Cluster services die when nonactive node is rebooted

Sun Jul 25 06:17:52 UTC 2010

Try to set following in you cluster.conf file

<cman expected_votes="3" quorum_dev_poll="35000" >

                <multicast addr="224.0.0.1" interface="eth0"/>

        </cman>

---
cal for
quorum_dev_poll > (interval * tko )

as per below 5*6 = 30 so 35
<quorumd i*nterval="5"* label="delta_qdisk" min_score="1" *tko="6"*votes="1">

                <heuristic interval="5" program="ping -t1 -c1 192.168.1.1"
score="1"/>

        </quorumd>

for more info read following doc
https://access.redhat.com/kb/docs/DOC-2882
http://people.redhat.com/lhh/cmanvsqdisk.png

On Sat, Jul 24, 2010 at 3:50 AM, Eric Schneider <eschneid at uccs.edu> wrote:

> I have a few 2 node clusters and I notice that recently the clusters lose
> quorum when I reboot the node without running services.  I could do this in
> the past without any problems.  CentOS 5.5 on ESX 4.0 u1.  Maybe a bug with
> a new kernel or cman software?
>
>
>
> I get the following right away when the node reboots:
>
> Jul 23 16:02:32 happy5 clurgmgrd[4269]: <notice> Member 2 shutting down
>
> Jul 23 16:02:52 happy5 qdiskd[3562]: <info> Node 2 shutdown
>
> Jul 23 16:03:02 happy5 qdiskd[3562]: <info> Assuming master role
>
> Jul 23 16:03:03 happy5 clurgmgrd[4269]: <emerg> #1: Quorum Dissolved
>
> Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] lost contact with quorum
> device
>
> Jul 23 16:03:03 happy5 openais[3533]: [CMAN ] quorum lost, blocking
> activity
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing
> connection.
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect:
> Connection refused
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Cluster is not quorate.  Refusing
> connection.
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing connect:
> Connection refused
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something
> evil.
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid
> request descriptor
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Invalid descriptor specified (-111).
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Someone may be attempting something
> evil.
>
> Jul 23 16:03:03 happy5 ccsd[3493]: Error while processing get: Invalid
> request descriptor
>
>
>
> <?xml version="1.0"?>
>
> <cluster alias="delta_cluster" config_version="40" name="delta_cluster">
>
>         <fence_daemon post_fail_delay="5" post_join_delay="120"/>
>
>         <quorumd interval="5" label="delta_qdisk" min_score="1" tko="6"
> votes="1">
>
>                 <heuristic interval="5" program="ping -t1 -c1 192.168.1.1"
> score="1"/>
>
>         </quorumd>
>
>         <clusternodes>
>
>                 <clusternode name="node1" nodeid="1" votes="1">
>
>                         <fence>
>
>                                 <method name="1">
>
>                                         <device name="node1"/>
>
>                                 </method>
>
>                         </fence>
>
>                 </clusternode>
>
>                 <clusternode name="node2" nodeid="2" votes="1">
>
>                         <fence>
>
>                                 <method name="1">
>
>                                         <device name="node2"/>
>
>                                 </method>
>
>                         </fence>
>
>                 </clusternode>
>
>         </clusternodes>
>
>         <cman expected_votes="3">
>
>                 <multicast addr="224.0.0.1" interface="eth0"/>
>
>         </cman>
>
>         <fencedevices>
>
>                 <fencedevice agent="fence_manual" name="fence_manual"/>
>
>                 <fencedevice agent="fence_vmware" ipaddr="bob"
> login="username" name="node1" passwd="password" port="node1"/>
>
>                 <fencedevice agent="fence_vmware" ipaddr="bob"
> login="username" name="node2" passwd="password" port="node2"/>
>
>         </fencedevices>
>
>         <rm>
>
>                 <failoverdomains>
>
>                         <failoverdomain name="node1" ordered="0"
> restricted="1">
>
>                                 <failoverdomainnode name="node1"
> priority="1"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="node2" restricted="1">
>
>                                 <failoverdomainnode name="node2"
> priority="1"/>
>
>                         </failoverdomain>
>
>                         <failoverdomain name="failover_pro-http"
> restricted="0">
>
>                                 <failoverdomainnode name="node1"
> priority="1"/>
>
>                                 <failoverdomainnode name="node2"
> priority="1"/>
>
>                         </failoverdomain>
>
>                 </failoverdomains>
>
>
>
>         </rm>
>
>         <totem token="21000"/>
>
> </cluster>
>
>
>
> Thanks,
>
>
>
> Eric
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20100725/d19bdc6b/attachment.htm>