[Linux-cluster] [cman] cant joint cluster after reboot
Yuriy Demchenko
demchenko.ya at gmail.com
Thu Nov 7 13:27:48 UTC 2013
Nope, nothing in logs suggests that node is fenced while in reboot.
Moreover, same behaviour persists with pacemaker started - and I've
explicitly put node into standby in pacemaker before reboot.
And same behaviour persists with stonith-enabled=false; same behaviour
with manual node fence via "stonith_admin --reboot
node-1.spb.stone.local". So i suppose fencing isn't issue here.
Yuriy Demchenko
On 11/07/2013 05:11 PM, Vishesh kumar wrote:
> My understanding is node fenced while rebooting. I suggest you to look
> info fencing logs as well. If your fencing logs not in detail use
> following in cluster.conf to enable logging
>
> <logging>
> <logging_daemon name="fenced" debug="on"/>
> </logging>
>
> Thanks
>
>
> On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko
> <demchenko.ya at gmail.com <mailto:demchenko.ya at gmail.com>> wrote:
>
> Hi,
>
> I'm trying to set up 3-node cluster (2 nodes + 1 standby node for
> quorum) with cman+pacemaker stack, everything according this
> quickstart article: http://clusterlabs.org/quickstart-redhat.html
>
> Cluster starts, all nodes see each other, quorum gained, stonith
> working, but I've run into problem with cman: node cant join
> cluster after reboot - cman starts and cman_tool nodes reports
> only that node as cluster-member, while on other 2 nodes it
> reports 2 nodes as cluster-member and 3rd as offline. cman
> stop/start/restart on the problem node does no effect - it still
> can see only itself, but if i'll do cman restart on one of working
> nodes - everything goes back to normal, all 3 nodes joins the
> cluster and subsequent cman service restarts on any nodes works
> fine - node lefts cluster and rejoins sucessfully. But again -
> only till node OS reboot.
>
> For example:
> [1] Working cluster:
>
> [root at node-1 ~]# cman_tool nodes
> Node Sts Inc Joined Name
> 1 M 592 2013-11-07 15:20:54 node-1.spb.stone.local
> 2 M 760 2013-11-07 15:20:54 node-2.spb.stone.local
> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
> [root at node-1 ~]# cman_tool status
> Version: 6.2.0
> Config Version: 10
> Cluster Name: ocluster
> Cluster Id: 2059
> Cluster Member: Yes
> Cluster Generation: 760
> Membership state: Cluster-Member
> Nodes: 3
> Expected votes: 3
> Total votes: 3
> Node votes: 1
> Quorum: 2
> Active subsystems: 7
> Flags:
> Ports Bound: 0
> Node name: node-1.spb.stone.local
> Node ID: 1
> Multicast addresses: 239.192.8.19
> Node addresses: 192.168.220.21
>
> Picture is same on all 3 nodes (except for node name and id) -
> same cluster name, cluster id, multicast addres.
>
> [2] I've put node-1 into reboot. After reboot complete, "cman_tool
> nodes" on node-2 and vnode-3 shows this:
>
> Node Sts Inc Joined Name
> 1 X 760 node-1.spb.stone.local
> 2 M 588 2013-11-07 15:11:23 node-2.spb.stone.local
> 3 M 760 2013-11-07 15:20:54 vnode-3.spb.stone.local
> [root at node-2 ~]# cman_tool status
> Version: 6.2.0
> Config Version: 10
> Cluster Name: ocluster
> Cluster Id: 2059
> Cluster Member: Yes
> Cluster Generation: 764
> Membership state: Cluster-Member
> Nodes: 2
> Expected votes: 3
> Total votes: 2
> Node votes: 1
> Quorum: 2
> Active subsystems: 7
> Flags:
> Ports Bound: 0
> Node name: node-2.spb.stone.local
> Node ID: 2
> Multicast addresses: 239.192.8.19
> Node addresses: 192.168.220.22
>
> But, on rebooted node-1 it shows this:
>
> Node Sts Inc Joined Name
> 1 M 764 2013-11-07 15:49:01 node-1.spb.stone.local
> 2 X 0 node-2.spb.stone.local
> 3 X 0 vnode-3.spb.stone.local
> [root at node-1 ~]# cman_tool status
> Version: 6.2.0
> Config Version: 10
> Cluster Name: ocluster
> Cluster Id: 2059
> Cluster Member: Yes
> Cluster Generation: 776
> Membership state: Cluster-Member
> Nodes: 1
> Expected votes: 3
> Total votes: 1
> Node votes: 1
> Quorum: 2 Activity blocked
> Active subsystems: 7
> Flags:
> Ports Bound: 0
> Node name: node-1.spb.stone.local
> Node ID: 1
> Multicast addresses: 239.192.8.19
> Node addresses: 192.168.220.21
>
> so, same cluster name, cluster id, multicast address - but it cant
> see other nodes. And there are nothing in /var/log/messages and
> /var/log/cluster/corosync.log on other two nodes - they seem not
> notice node-1 coming back online at all, last records about node-1
> leaving cluster.
>
> [3] If now i do "service cman restart" on node-2 or vnode-3 -
> everything goes back to normal operation as in [1]
> in logs it shows as node-2 leaving cluster (service stop) and
> simultaneously joining of both node-2 and node-1 (service start)
>
> Nov 7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3
> Nov 7 11:47:06 vnode-3 corosync[26692]: [TOTEM ] A
> processor joined or left the membership and a new membership
> was formed.
> Nov 7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1
> Nov 7 11:47:06 vnode-3 corosync[26692]: [CPG ] chosen
> downlist: sender r(0) ip(192.168.220.22) ; members(old:3 left:1)
> Nov 7 11:47:06 vnode-3 corosync[26692]: [MAIN ] Completed
> service synchronization, ready to provide service.
> Nov 7 11:53:28 vnode-3 corosync[26692]: [QUORUM] Members[1]: 3
> Nov 7 11:53:28 vnode-3 corosync[26692]: [TOTEM ] A
> processor joined or left the membership and a new membership
> was formed.
> Nov 7 11:53:28 vnode-3 corosync[26692]: [CPG ] chosen
> downlist: sender r(0) ip(192.168.220.14) ; members(old:2 left:1)
> Nov 7 11:53:28 vnode-3 corosync[26692]: [MAIN ] Completed
> service synchronization, ready to provide service.
> Nov 7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2
> Nov 7 11:53:30 vnode-3 corosync[26692]: [TOTEM ] A
> processor joined or left the membership and a new membership
> was formed.
> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM]
> Members[2]: 1 3
> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM]
> Members[2]: 1 3
> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM]
> Members[3]: 1 2 3
> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM]
> Members[3]: 1 2 3
> Nov 7 11:53:30 vnode-3 corosync[26692]: [QUORUM]
> Members[3]: 1 2 3
> Nov 7 11:53:30 vnode-3 corosync[26692]: [CPG ] chosen
> downlist: sender r(0) ip(192.168.220.21) ; members(old:1 left:0)
> Nov 7 11:53:30 vnode-3 corosync[26692]: [MAIN ] Completed
> service synchronization, ready to provide service.
>
>
> I've set up such cluster before in quite same configuration and
> never had any problems, but now I'm completely stuck.
> So, what is wrong with my cluster and how to fix it?
>
> OS Centos 6.4 with lastest updates, firewall disabled, selinux
> permissive, all 3 nodes inside same network. Multicast working -
> checked with omping.
> cman.x86_64 3.0.12.1-49.el6_4.2 @centos6-updates
> corosync.x86_64 1.4.1-15.el6_4.1 @centos6-updates
> pacemaker.x86_64 1.1.10-1.el6_4.4 @centos6-updates
>
> cluster.conf is in attach
>
> --
> Yuriy Demchenko
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> --
> http://linuxmantra.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20131107/1f37ad94/attachment.htm>
More information about the Linux-cluster
mailing list