[Linux-cluster] CLVM & CMAN live adding nodes
Bjoern Teipel
bjoern.teipel at internetbrands.com
Mon Feb 24 08:39:51 UTC 2014
Hi Fabio,
removing UDPU does not change the behavior, the new node still doesn't join
the cluster and still wants to fence node 01
It still feels like a split brain or so.
How do you join a new node, using the /etc/init.d/cman start or using
cman_tool / dlm_tool join ?
Bjoern
On Sat, Feb 22, 2014 at 10:16 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:
> On 02/22/2014 08:05 PM, Bjoern Teipel wrote:
> > Thanks Fabio for replying may request.
> >
> > I'm using stock CentOS 6.4 versions and no rm, just clvmd and dlm.
> >
> > Name : cman Relocations: (not relocatable)
> > Version : 3.0.12.1 Vendor: CentOS
> > Release : 49.el6_4.2 Build Date: Tue 03 Sep 2013
> > 02:18:10 AM PDT
> >
> > Name : lvm2-cluster Relocations: (not relocatable)
> > Version : 2.02.98 Vendor: CentOS
> > Release : 9.el6_4.3 Build Date: Tue 05 Nov 2013
> > 07:36:18 AM PST
> >
> > Name : corosync Relocations: (not relocatable)
> > Version : 1.4.1 Vendor: CentOS
> > Release : 15.el6_4.1 Build Date: Tue 14 May 2013
> > 02:09:27 PM PDT
> >
> >
> > My question is based off this problem I have till January:
> >
> >
> > When ever I add a new node (I put into the cluster.conf and reloaded
> > with cman_tool version -r -S) I end up with situations like the new
> > node wants to gain the quorum and starts to fence the existing pool
> > master and appears to generate some sort of split cluster. Does it work
> > at all, corosync and dlm do not know about the recently added node ?
>
> I can see you are using UDPU and that could be the culprit. Can you drop
> UDPU and work with multicast?
>
> Jan/Chrissie: do you remember if we support adding nodes at runtime with
> UDPU?
>
> The standalone node should not have quorum at all and should not be able
> to fence anybody to start with.
>
> >
> > New Node
> > ==========
> >
> > Node Sts Inc Joined Name
> > 1 X 0 hv-1
> > 2 X 0 hv-2
> > 3 X 0 hv-3
> > 4 X 0 hv-4
> > 5 X 0 hv-5
> > 6 M 80 2014-01-07 21:37:42 hv-6<--- host added
> >
> >
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] The network interface
> > [10.14.18.77] is now up.
> > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Using quorum provider
> > quorum_cman
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync cluster quorum service v0.1
> > Jan 7 21:37:42 hv-1 corosync[12564]: [CMAN ] CMAN 3.0.12.1 (built
> > Sep 3 2013 09:17:34) started
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync CMAN membership service 2.90
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > openais checkpoint service B.01.01
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync extended virtual synchrony service
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync configuration service
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync cluster closed process group service v1.01
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync cluster config database access v1.01
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync profile loading service
> > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Using quorum provider
> > quorum_cman
> > Jan 7 21:37:42 hv-1 corosync[12564]: [SERV ] Service engine loaded:
> > corosync cluster quorum service v0.1
> > Jan 7 21:37:42 hv-1 corosync[12564]: [MAIN ] Compatibility mode set
> > to whitetank. Using V1 and V2 of the synchronization engine.
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.65}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.67}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.68}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.70}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.66}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] adding new UDPU member
> > {10.14.18.77}
> > Jan 7 21:37:42 hv-1 corosync[12564]: [TOTEM ] A processor joined or
> > left the membership and a new membership was formed.
> > Jan 7 21:37:42 hv-1 corosync[12564]: [CMAN ] quorum regained,
> > resuming activity
> > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] This node is within the
> > primary component and will provide service.
> > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Members[1]: 6
> > Jan 7 21:37:42 hv-1 corosync[12564]: [QUORUM] Members[1]: 6
> > Jan 7 21:37:42 hv-1 corosync[12564]: [CPG ] chosen downlist: sender
> > r(0) ip(10.14.18.77) ; members(old:0 left:0)
> > Jan 7 21:37:42 hv-1 corosync[12564]: [MAIN ] Completed service
> > synchronization, ready to provide service.
> > Jan 7 21:37:46 hv-1 fenced[12620]: fenced 3.0.12.1 started
> > Jan 7 21:37:46 hv-1 dlm_controld[12643]: dlm_controld 3.0.12.1 started
> > Jan 7 21:37:47 hv-1 gfs_controld[12695]: gfs_controld 3.0.12.1 started
> > Jan 7 21:37:54 hv-1 fenced[12620]: fencing node hv-b1clcy1
> >
> > sudo -i corosync-objctl |grep member
> >
> > totem.interface.member.memberaddr=hv-1
> > totem.interface.member.memberaddr=hv-2
> > totem.interface.member.memberaddr=hv-3
> > totem.interface.member.memberaddr=hv-4
> > totem.interface.member.memberaddr=hv-5
> > totem.interface.member.memberaddr=hv-6
> > runtime.totem.pg.mrp.srp.members.6.ip=r(0) ip(10.14.18.77)
> > runtime.totem.pg.mrp.srp.members.6.join_count=1
> > runtime.totem.pg.mrp.srp.members.6.status=joined
> >
> >
> > Existing Node
> > =============
> >
> > member 6 has not been added to the quorum list :
> >
> > Jan 7 21:36:28 hv-1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5
> > Jan 7 21:37:54 hv-1 corosync[7769]: [TOTEM ] A processor joined or
> > left the membership and a new membership was formed.
> > Jan 7 21:37:54 hv-1 corosync[7769]: [CPG ] chosen downlist: sender
> > r(0) ip(10.14.18.65) ; members(old:4 left:0)
> >
> >
> > Node Sts Inc Joined Name
> > 1 M 4468 2013-12-10 14:33:27 hv-1
> > 2 M 4468 2013-12-10 14:33:27 hv-2
> > 3 M 5036 2014-01-07 17:51:26 hv-3
> > 4 X 4468 hv-4(dead at the moment)
> > 5 M 4468 2013-12-10 14:33:27 hv-5
> > 6 X 0 hv-6<--- added
> >
> >
> > Jan 7 21:36:28 hv-1 corosync[7769]: [QUORUM] Members[4]: 1 2 3 5
> > Jan 7 21:37:54 hv-1 corosync[7769]: [TOTEM ] A processor joined or
> > left the membership and a new membership was formed.
> > Jan 7 21:37:54 hv-1 corosync[7769]: [CPG ] chosen downlist: sender
> > r(0) ip(10.14.18.65) ; members(old:4 left:0)
> > Jan 7 21:37:54 hv-1 corosync[7769]: [MAIN ] Completed service
> > synchronization, ready to provide service.
> >
> >
> > totem.interface.member.memberaddr=hv-1
> > totem.interface.member.memberaddr=hv-2
> > totem.interface.member.memberaddr=hv-3
> > totem.interface.member.memberaddr=hv-4
> > totem.interface.member.memberaddr=hv-5.
> > runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.14.18.65)
> > runtime.totem.pg.mrp.srp.members.1.join_count=1
> > runtime.totem.pg.mrp.srp.members.1.status=joined
> > runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.14.18.66)
> > runtime.totem.pg.mrp.srp.members.2.join_count=1
> > runtime.totem.pg.mrp.srp.members.2.status=joined
> > runtime.totem.pg.mrp.srp.members.4.ip=r(0) ip(10.14.18.68)
> > runtime.totem.pg.mrp.srp.members.4.join_count=1
> > runtime.totem.pg.mrp.srp.members.4.status=left
> > runtime.totem.pg.mrp.srp.members.5.ip=r(0) ip(10.14.18.70)
> > runtime.totem.pg.mrp.srp.members.5.join_count=1
> > runtime.totem.pg.mrp.srp.members.5.status=joined
> > runtime.totem.pg.mrp.srp.members.3.ip=r(0) ip(10.14.18.67)
> > runtime.totem.pg.mrp.srp.members.3.join_count=3
> > runtime.totem.pg.mrp.srp.members.3.status=joined
> >
> >
> > cluster.conf:
> >
> > <?xml version="1.0"?>
> > <cluster config_version="32" name="hv-1618-110-1">
> > <fence_daemon clean_start="0"/>
> > <cman transport="udpu" expected_votes="1"/>
> > <logging debug="off"/>
> > <clusternodes>
> > <clusternode name="hv-1" votes="1" nodeid="1"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > <clusternode name="hv-2" votes="1" nodeid="3"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > <clusternode name="hv-3" votes="1" nodeid="4"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > <clusternode name="hv-4" votes="1" nodeid="5"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > <clusternode name="hv-5" votes="1" nodeid="2"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > <clusternode name="hv-6" votes="1" nodeid="6"><fence><method
> > name="single"><device name="human"/></method></fence></clusternode>
> > </clusternodes>
> > <fencedevices>
> > <fencedevice name="human" agent="manual"/></fencedevices>
> > <rm/>
> > </cluster>
> >
> > (manual fencing just for testing)
> >
> >
> > corosync.conf:
> >
> > compatibility: whitetank
> > totem {
> > version: 2
> > secauth: off
> > threads: 0
> > # fail_recv_const: 5000
> > interface {
> > ringnumber: 0
> > bindnetaddr: 10.14.18.0
> > mcastaddr: 239.0.0.4
> > mcastport: 5405
> > }
> > }
> > logging {
> > fileline: off
> > to_stderr: no
> > to_logfile: yes
> > to_syslog: yes
> > # the pathname of the log file
> > logfile: /var/log/cluster/corosync.log
> > debug: off
> > timestamp: on
> > logger_subsys {
> > subsys: AMF
> > debug: off
> > }
> > }
> >
> > amf {
> > mode: disabled
> > }
> >
>
> when using cman, corosync.conf is not used/read.
>
> Fabio
>
> >
> >
> > On Sat, Feb 22, 2014 at 5:54 AM, Fabio M. Di Nitto <fdinitto at redhat.com
> > <mailto:fdinitto at redhat.com>> wrote:
> >
> > On 02/22/2014 10:33 AM, emmanuel segura wrote:
> > > I know if you need to modify anything outside <rm>... </rm>{used by
> > > rgmanager} tag in the cluster.conf file, you need to restart the
> whole
> > > cluster stack, with cman+rgmanager i have never seen how to add a
> node
> > > and remove a node from cluster without restart cman.
> >
> > It depends on the version. RHEL5 that's correct, RHEL6 it works also
> for
> > outside of <rm> but there are some limitations as some parameters
> just
> > can't be changed runtime.
> >
> > Fabio
> >
> > >
> > >
> > >
> > >
> > > 2014-02-22 6:21 GMT+01:00 Bjoern Teipel
> > > <bjoern.teipel at internetbrands.com
> > <mailto:bjoern.teipel at internetbrands.com>
> > > <mailto:bjoern.teipel at internetbrands.com
> > <mailto:bjoern.teipel at internetbrands.com>>>:
> > >
> > > Hi all,
> > >
> > > who's using CLVM with CMAN in a cluster with more than 2 nodes
> in
> > > production ?
> > > Did you guys got it to manage to live add a new node to the
> > cluster
> > > while everything is running ?
> > > I'm only able to add nodes while the cluster stack is shutdown.
> > > That's certainly not a good idea when you have to run CLVM on
> > > hypervisors and you need to shutdown all VMs to add a new box.
> > > Would be also good if you paste some of your configs using
> > IPMI fencing
> > >
> > > Thanks in advance,
> > > Bjoern
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> > <mailto:Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>>
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > >
> > >
> > > --
> > > esta es mi vida e me la vivo hasta que dios quiera
> > >
> > >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140224/1ca9580a/attachment.htm>
More information about the Linux-cluster
mailing list