[Linux-cluster] [cman] cant joint cluster after reboot

Thu Nov 7 13:27:48 UTC 2013

Nope, nothing in logs suggests that node is fenced while in reboot. 
Moreover, same behaviour persists with pacemaker started - and I've 
explicitly put node into standby in pacemaker before reboot.
And same behaviour persists with stonith-enabled=false; same behaviour 
with manual node fence via "stonith_admin --reboot 
node-1.spb.stone.local". So i suppose fencing isn't issue here.

Yuriy Demchenko

On 11/07/2013 05:11 PM, Vishesh kumar wrote:
> My understanding is node fenced while rebooting. I suggest you to look 
> info fencing logs as well. If your fencing logs not in detail use 
> following in cluster.conf to enable logging
>
> <logging>
>               <logging_daemon name="fenced" debug="on"/>
>    </logging>
>
> Thanks
>
>
> On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko 
> <demchenko.ya at gmail.com <mailto:demchenko.ya at gmail.com>> wrote:
>
>     Hi,
>
>     I'm trying to set up 3-node cluster (2 nodes + 1 standby node for
>     quorum) with cman+pacemaker stack, everything according this
>     quickstart article: http://clusterlabs.org/quickstart-redhat.html
>
>     Cluster starts, all nodes see each other, quorum gained, stonith
>     working, but I've run into problem with cman: node cant join
>     cluster after reboot - cman starts and cman_tool nodes reports
>     only that node as cluster-member, while on other 2 nodes it
>     reports 2 nodes as cluster-member and 3rd as offline. cman
>     stop/start/restart on the problem node does no effect - it still
>     can see only itself, but if i'll do cman restart on one of working
>     nodes - everything goes back to normal, all 3 nodes joins the
>     cluster and subsequent cman service restarts on any nodes works
>     fine - node lefts cluster and rejoins sucessfully. But again -
>     only till node OS reboot.
>
>     For example:
>     [1] Working cluster:
>
>         [root at node-1 ~]# cman_tool nodes
>         Node  Sts   Inc   Joined               Name
>            1   M    592   2013-11-07 15:20:54  node-1.spb.stone.local
>            2   M    760   2013-11-07 15:20:54  node-2.spb.stone.local
>            3   M    760   2013-11-07 15:20:54  vnode-3.spb.stone.local
>         [root at node-1 ~]# cman_tool status
>         Version: 6.2.0
>         Config Version: 10
>         Cluster Name: ocluster
>         Cluster Id: 2059
>         Cluster Member: Yes
>         Cluster Generation: 760
>         Membership state: Cluster-Member
>         Nodes: 3
>         Expected votes: 3
>         Total votes: 3
>         Node votes: 1
>         Quorum: 2
>         Active subsystems: 7
>         Flags:
>         Ports Bound: 0
>         Node name: node-1.spb.stone.local
>         Node ID: 1
>         Multicast addresses: 239.192.8.19
>         Node addresses: 192.168.220.21
>
>     Picture is same on all 3 nodes (except for node name and id) -
>     same cluster name, cluster id, multicast addres.
>
>     [2] I've put node-1 into reboot. After reboot complete, "cman_tool
>     nodes" on node-2 and vnode-3 shows this:
>
>         Node  Sts   Inc   Joined               Name
>            1   X    760  node-1.spb.stone.local
>            2   M    588   2013-11-07 15:11:23  node-2.spb.stone.local
>            3   M    760   2013-11-07 15:20:54  vnode-3.spb.stone.local
>         [root at node-2 ~]# cman_tool status
>         Version: 6.2.0
>         Config Version: 10
>         Cluster Name: ocluster
>         Cluster Id: 2059
>         Cluster Member: Yes
>         Cluster Generation: 764
>         Membership state: Cluster-Member
>         Nodes: 2
>         Expected votes: 3
>         Total votes: 2
>         Node votes: 1
>         Quorum: 2
>         Active subsystems: 7
>         Flags:
>         Ports Bound: 0
>         Node name: node-2.spb.stone.local
>         Node ID: 2
>         Multicast addresses: 239.192.8.19
>         Node addresses: 192.168.220.22
>
>     But, on rebooted node-1 it shows this:
>
>         Node  Sts   Inc   Joined               Name
>            1   M    764   2013-11-07 15:49:01  node-1.spb.stone.local
>            2   X      0  node-2.spb.stone.local
>            3   X      0  vnode-3.spb.stone.local
>         [root at node-1 ~]# cman_tool status
>         Version: 6.2.0
>         Config Version: 10
>         Cluster Name: ocluster
>         Cluster Id: 2059
>         Cluster Member: Yes
>         Cluster Generation: 776
>         Membership state: Cluster-Member
>         Nodes: 1
>         Expected votes: 3
>         Total votes: 1
>         Node votes: 1
>         Quorum: 2 Activity blocked
>         Active subsystems: 7
>         Flags:
>         Ports Bound: 0
>         Node name: node-1.spb.stone.local
>         Node ID: 1
>         Multicast addresses: 239.192.8.19
>         Node addresses: 192.168.220.21
>
>     so, same cluster name, cluster id, multicast address - but it cant
>     see other nodes. And there are nothing in /var/log/messages and
>     /var/log/cluster/corosync.log on other two nodes - they seem not
>     notice node-1 coming back online at all, last records about node-1
>     leaving cluster.
>
>     [3] If now i do "service cman restart" on node-2 or vnode-3 -
>     everything goes back to normal operation as in [1]
>     in logs it shows as node-2 leaving cluster (service stop) and
>     simultaneously joining of both node-2 and node-1 (service start)
>
>         Nov  7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3
>         Nov  7 11:47:06 vnode-3 corosync[26692]:   [TOTEM ] A
>         processor joined or left the membership and a new membership
>         was formed.
>         Nov  7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1
>         Nov  7 11:47:06 vnode-3 corosync[26692]:   [CPG   ] chosen
>         downlist: sender r(0) ip(192.168.220.22) ; members(old:3 left:1)
>         Nov  7 11:47:06 vnode-3 corosync[26692]:   [MAIN  ] Completed
>         service synchronization, ready to provide service.
>         Nov  7 11:53:28 vnode-3 corosync[26692]:   [QUORUM] Members[1]: 3
>         Nov  7 11:53:28 vnode-3 corosync[26692]:   [TOTEM ] A
>         processor joined or left the membership and a new membership
>         was formed.
>         Nov  7 11:53:28 vnode-3 corosync[26692]:   [CPG   ] chosen
>         downlist: sender r(0) ip(192.168.220.14) ; members(old:2 left:1)
>         Nov  7 11:53:28 vnode-3 corosync[26692]:   [MAIN  ] Completed
>         service synchronization, ready to provide service.
>         Nov  7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [TOTEM ] A
>         processor joined or left the membership and a new membership
>         was formed.
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
>         Members[2]: 1 3
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
>         Members[2]: 1 3
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
>         Members[3]: 1 2 3
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
>         Members[3]: 1 2 3
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
>         Members[3]: 1 2 3
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [CPG   ] chosen
>         downlist: sender r(0) ip(192.168.220.21) ; members(old:1 left:0)
>         Nov  7 11:53:30 vnode-3 corosync[26692]:   [MAIN  ] Completed
>         service synchronization, ready to provide service.
>
>
>     I've set up such cluster before in quite same configuration and
>     never had any problems, but now I'm completely stuck.
>     So, what is wrong with my cluster and how to fix it?
>
>     OS Centos 6.4 with lastest updates, firewall disabled, selinux
>     permissive, all 3 nodes inside same network. Multicast working -
>     checked with omping.
>     cman.x86_64                   3.0.12.1-49.el6_4.2 @centos6-updates
>     corosync.x86_64               1.4.1-15.el6_4.1 @centos6-updates
>     pacemaker.x86_64              1.1.10-1.el6_4.4 @centos6-updates
>
>     cluster.conf is in attach
>
>     -- 
>     Yuriy Demchenko
>
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> -- 
> http://linuxmantra.com
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20131107/1f37ad94/attachment.htm>