[Linux-cluster] [cman] cant joint cluster after reboot
Vishesh kumar
linuxtovishesh at gmail.com
Thu Nov 7 13:11:27 UTC 2013
My understanding is node fenced while rebooting. I suggest you to look info
fencing logs as well. If your fencing logs not in detail use following in
cluster.conf to enable logging
<logging>
<logging_daemon name="fenced" debug="on"/>
</logging>
Thanks
On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko <demchenko.ya at gmail.com>wrote:
> Hi,
>
> I'm trying to set up 3-node cluster (2 nodes + 1 standby node for quorum)
> with cman+pacemaker stack, everything according this quickstart article:
> http://clusterlabs.org/quickstart-redhat.html
>
> Cluster starts, all nodes see each other, quorum gained, stonith working,
> but I've run into problem with cman: node cant join cluster after reboot -
> cman starts and cman_tool nodes reports only that node as cluster-member,
> while on other 2 nodes it reports 2 nodes as cluster-member and 3rd as
> offline. cman stop/start/restart on the problem node does no effect - it
> still can see only itself, but if i'll do cman restart on one of working
> nodes - everything goes back to normal, all 3 nodes joins the cluster and
> subsequent cman service restarts on any nodes works fine - node lefts
> cluster and rejoins sucessfully. But again - only till node OS reboot.
>
> For example:
> [1] Working cluster:
>
>> [root at node-1 ~]# cman_tool nodes
>> Node Sts Inc Joined Name
>> 1 M 592 2013-11-07 15:20:54 node-1.spb.stone.local
>> 2 M 760 2013-11-07 15:20:54 node-2.spb.stone.local
>> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
>> [root at node-1 ~]# cman_tool status
>> Version: 6.2.0
>> Config Version: 10
>> Cluster Name: ocluster
>> Cluster Id: 2059
>> Cluster Member: Yes
>> Cluster Generation: 760
>> Membership state: Cluster-Member
>> Nodes: 3
>> Expected votes: 3
>> Total votes: 3
>> Node votes: 1
>> Quorum: 2
>> Active subsystems: 7
>> Flags:
>> Ports Bound: 0
>> Node name: node-1.spb.stone.local
>> Node ID: 1
>> Multicast addresses: 239.192.8.19
>> Node addresses: 192.168.220.21
>>
> Picture is same on all 3 nodes (except for node name and id) - same
> cluster name, cluster id, multicast addres.
>
> [2] I've put node-1 into reboot. After reboot complete, "cman_tool nodes"
> on node-2 and vnode-3 shows this:
>
>> Node Sts Inc Joined Name
>> 1 X 760 node-1.spb.stone.local
>> 2 M 588 2013-11-07 15:11:23 node-2.spb.stone.local
>> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
>> [root at node-2 ~]# cman_tool status
>> Version: 6.2.0
>> Config Version: 10
>> Cluster Name: ocluster
>> Cluster Id: 2059
>> Cluster Member: Yes
>> Cluster Generation: 764
>> Membership state: Cluster-Member
>> Nodes: 2
>> Expected votes: 3
>> Total votes: 2
>> Node votes: 1
>> Quorum: 2
>> Active subsystems: 7
>> Flags:
>> Ports Bound: 0
>> Node name: node-2.spb.stone.local
>> Node ID: 2
>> Multicast addresses: 239.192.8.19
>> Node addresses: 192.168.220.22
>>
> But, on rebooted node-1 it shows this:
>
>> Node Sts Inc Joined Name
>> 1 M 764 2013-11-07 15:49:01 node-1.spb.stone.local
>> 2 X 0 node-2.spb.stone.local
>> 3 X 0 vnode-3.spb.stone.local
>> [root at node-1 ~]# cman_tool status
>> Version: 6.2.0
>> Config Version: 10
>> Cluster Name: ocluster
>> Cluster Id: 2059
>> Cluster Member: Yes
>> Cluster Generation: 776
>> Membership state: Cluster-Member
>> Nodes: 1
>> Expected votes: 3
>> Total votes: 1
>> Node votes: 1
>> Quorum: 2 Activity blocked
>> Active subsystems: 7
>> Flags:
>> Ports Bound: 0
>> Node name: node-1.spb.stone.local
>> Node ID: 1
>> Multicast addresses: 239.192.8.19
>> Node addresses: 192.168.220.21
>>
> so, same cluster name, cluster id, multicast address - but it cant see
> other nodes. And there are nothing in /var/log/messages and
> /var/log/cluster/corosync.log on other two nodes - they seem not notice
> node-1 coming back online at all, last records about node-1 leaving cluster.
>
> [3] If now i do "service cman restart" on node-2 or vnode-3 - everything
> goes back to normal operation as in [1]
> in logs it shows as node-2 leaving cluster (service stop) and
> simultaneously joining of both node-2 and node-1 (service start)
>
>> Nov 7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3
>> Nov 7 11:47:06 vnode-3 corosync[26692]: [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Nov 7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1
>> Nov 7 11:47:06 vnode-3 corosync[26692]: [CPG ] chosen downlist:
>> sender r(0) ip(192.168.220.22) ; members(old:3 left:1)
>> Nov 7 11:47:06 vnode-3 corosync[26692]: [MAIN ] Completed service
>> synchronization, ready to provide service.
>> Nov 7 11:53:28 vnode-3 corosync[26692]: [QUORUM] Members[1]: 3
>> Nov 7 11:53:28 vnode-3 corosync[26692]: [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Nov 7 11:53:28 vnode-3 corosync[26692]: [CPG ] chosen downlist:
>> sender r(0) ip(192.168.220.14) ; members(old:2 left:1)
>> Nov 7 11:53:28 vnode-3 corosync[26692]: [MAIN ] Completed service
>> synchronization, ready to provide service.
>> Nov 7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [TOTEM ] A processor joined or
>> left the membership and a new membership was formed.
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[2]: 1 3
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM] Members[3]: 1 2 3
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [CPG ] chosen downlist:
>> sender r(0) ip(192.168.220.21) ; members(old:1 left:0)
>> Nov 7 11:53:30 vnode-3 corosync[26692]: [MAIN ] Completed service
>> synchronization, ready to provide service.
>>
>
> I've set up such cluster before in quite same configuration and never had
> any problems, but now I'm completely stuck.
> So, what is wrong with my cluster and how to fix it?
>
> OS Centos 6.4 with lastest updates, firewall disabled, selinux permissive,
> all 3 nodes inside same network. Multicast working - checked with omping.
> cman.x86_64 3.0.12.1-49.el6_4.2 @centos6-updates
> corosync.x86_64 1.4.1-15.el6_4.1 @centos6-updates
> pacemaker.x86_64 1.1.10-1.el6_4.4 @centos6-updates
>
> cluster.conf is in attach
>
> --
> Yuriy Demchenko
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
--
http://linuxmantra.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20131107/233df8fe/attachment.htm>
More information about the Linux-cluster
mailing list