From daniel.dehennin at baby-gnu.org Fri Apr 8 09:21:08 2016 From: daniel.dehennin at baby-gnu.org (Daniel Dehennin) Date: Fri, 08 Apr 2016 11:21:08 +0200 Subject: [Linux-cluster] GFS2: debugging I/O issues Message-ID: <87h9fc78xn.fsf@hati.baby-gnu.org> Hello, On our virtualisation infrastructure we have a 4To GFS2 over a SAN. Since one or two weeks we are facing read I/O issues, 5k or 6k IOPS with an average block size of 5kB. I'm looking for the possibilities and didn't find anything yet, so my question: Is it possible that reaching over 80% use of the GFS2 can produce such workload? Regards. -- Daniel Dehennin R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 342 bytes Desc: not available URL: From swhiteho at redhat.com Fri Apr 8 09:28:56 2016 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 8 Apr 2016 10:28:56 +0100 Subject: [Linux-cluster] GFS2: debugging I/O issues In-Reply-To: <87h9fc78xn.fsf@hati.baby-gnu.org> References: <87h9fc78xn.fsf@hati.baby-gnu.org> Message-ID: <570779D8.9070709@redhat.com> Hi, On 08/04/16 10:21, Daniel Dehennin wrote: > Hello, > > On our virtualisation infrastructure we have a 4To GFS2 over a SAN. > > Since one or two weeks we are facing read I/O issues, 5k or 6k IOPS with > an average block size of 5kB. > > I'm looking for the possibilities and didn't find anything yet, so my > question: > > Is it possible that reaching over 80% use of the GFS2 can produce > such workload? > > Regards. > > > If you are worried about read I/O, then I'd look carefully at the fragmentation using filefrag on a few representative files to see how they are laid out on disk. There are other possible causes of performance issues too - do you have the fs mounted noatime (which we recommend for most use cases) for example? Running a filesystem which is close to the capacity limit can generate fragmentation over time, 80% would usually be ok, and more recent versions of GFS2 are better than older ones at avoiding fragmentation in such circumstances, Steve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.dehennin at baby-gnu.org Fri Apr 8 11:13:02 2016 From: daniel.dehennin at baby-gnu.org (Daniel Dehennin) Date: Fri, 08 Apr 2016 13:13:02 +0200 Subject: [Linux-cluster] GFS2: debugging I/O issues In-Reply-To: <570779D8.9070709@redhat.com> (Steven Whitehouse's message of "Fri, 8 Apr 2016 10:28:56 +0100") References: <87h9fc78xn.fsf@hati.baby-gnu.org> <570779D8.9070709@redhat.com> Message-ID: <87d1q073r5.fsf@hati.baby-gnu.org> Steven Whitehouse writes: > If you are worried about read I/O, then I'd look carefully at the > fragmentation using filefrag on a few representative files to see how > they are laid out on disk. A running qcow2 image, using a backing file: - running qcow2 is 822MB with 3002 extents - backing file is 2.2GB with 2893 extents Another saved images, used as read-only backing file for running VMs is 7.5GB with 9640 extents. > There are other possible causes of > performance issues too - do you have the fs mounted noatime (which we > recommend for most use cases) for example? Right, I missed that one, I need to planify a down time. > Running a filesystem which is close to the capacity limit can generate > fragmentation over time, 80% would usually be ok, and more recent > versions of GFS2 are better than older ones at avoiding fragmentation > in such circumstances, It's running on Ubuntu Trusty a 3.13 kernel and gfs2-utils 3.1.6. Thanks. -- Daniel Dehennin R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 342 bytes Desc: not available URL: From daniel.dehennin at baby-gnu.org Mon Apr 11 12:29:14 2016 From: daniel.dehennin at baby-gnu.org (Daniel Dehennin) Date: Mon, 11 Apr 2016 14:29:14 +0200 Subject: [Linux-cluster] GFS2 and LVM stripes Message-ID: <8737qs72hx.fsf@hati.baby-gnu.org> Hello, My OpenNebula cluster has a 4TB GFS2 logical volume supported by two physical volumes (2TB each). The result is that near all I/O go to a single PV. Now I'm looking at a way to convert linear LV to a stripping one and only found the possibility to go with a mirror[1]. Do you have any advice on the use of GFS2 over stipped LVM? Regards. Footnotes: [1] http://community.hpe.com/t5/System-Administration/Need-to-move-the-data-from-Linear-LV-to-stripped-LV-on-RHEL-5-7/td-p/6134323 -- Daniel Dehennin R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 342 bytes Desc: not available URL: From swhiteho at redhat.com Mon Apr 11 12:52:00 2016 From: swhiteho at redhat.com (Steven Whitehouse) Date: Mon, 11 Apr 2016 13:52:00 +0100 Subject: [Linux-cluster] GFS2 and LVM stripes In-Reply-To: <8737qs72hx.fsf@hati.baby-gnu.org> References: <8737qs72hx.fsf@hati.baby-gnu.org> Message-ID: <570B9DF0.9000005@redhat.com> Hi, On 11/04/16 13:29, Daniel Dehennin wrote: > Hello, > > My OpenNebula cluster has a 4TB GFS2 logical volume supported by two > physical volumes (2TB each). > > The result is that near all I/O go to a single PV. > > Now I'm looking at a way to convert linear LV to a stripping one and > only found the possibility to go with a mirror[1]. > > Do you have any advice on the use of GFS2 over stipped LVM? > > Regards. > > Footnotes: > [1] http://community.hpe.com/t5/System-Administration/Need-to-move-the-data-from-Linear-LV-to-stripped-LV-on-RHEL-5-7/td-p/6134323 > > > It will depend on the workload as to what is the best stripe size to choose, so you might want to try some different sizes to see what will work best in your case, Steve. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano.panella at citrix.com Tue Apr 12 12:45:21 2016 From: stefano.panella at citrix.com (Stefano Panella) Date: Tue, 12 Apr 2016 12:45:21 +0000 Subject: [Linux-cluster] Help with corosync and GFS2 on multi network setup Message-ID: Hi everybody, we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far! We now have a new set-up with two network interfaces for every host in the cluster: A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X) B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X) when we run corosync in this mode we get the logs continuously spammed by messages like these: [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus timeout). [12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the rep. [12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received 10 [12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750 [12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state. [12880] cl15-02 corosyncdebug [TOTEM ] got commit token [12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state. [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41: [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47: [12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41: [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 [12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47: [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 [12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages in recovery. [12880] cl15-02 corosyncdebug [TOTEM ] got commit token [12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 [12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 [12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state [12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0 [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) [12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state. [12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157) [12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/27158) [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2 [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1 [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22) [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677 Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22) [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874 [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync vote quorum service v1.0 [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 [12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2 [12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil), length = 56 [12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service. [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0 [12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0 [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge during join). and we do not get them when there is only a single network interface in the systems. -------------------------------------------------------------------------------------- These are the network configurations on the three hosts: [root at cl15-02 ~]# ifconfig | grep inet inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 [root at cl15-08 ~]# ifconfig | grep inet inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 [root at cl15-09 ~]# ifconfig | grep inet inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255 inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255 inet 127.0.0.1 netmask 255.0.0.0 ----------------------------------------------------------------------------------- corosync-quorumtool output: [root at cl15-02 ~]# corosync-quorumtool Quorum information ------------------ Date: Mon Apr 11 15:46:26 2016 Quorum provider: corosync_votequorum Nodes: 3 Node ID: 1 Ring ID: 18952 Quorate: Yes Votequorum information ---------------------- Expected votes: 3 Highest expected: 3 Total votes: 3 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 1 1 cl15-02 (local) 2 1 cl15-08 3 1 cl15-09 --------------------------------------------------------------------------- /etc/corosync/corosync.conf: [root at cl15-02 ~]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: gfs_cluster transport: udpu } nodelist { node { ring0_addr: cl15-02 nodeid: 1 } node { ring0_addr: cl15-08 nodeid: 2 } node { ring0_addr: cl15-09 nodeid: 3 } } quorum { provider: corosync_votequorum } logging { debug: on to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes } From ccaulfie at redhat.com Tue Apr 12 13:28:01 2016 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 12 Apr 2016 14:28:01 +0100 Subject: [Linux-cluster] Help with corosync and GFS2 on multi network setup In-Reply-To: References: Message-ID: <570CF7E1.3090309@redhat.com> On 12/04/16 13:45, Stefano Panella wrote: > Hi everybody, > > we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far! > > We now have a new set-up with two network interfaces for every host in the cluster: > A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X) > B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X) > > when we run corosync in this mode we get the logs continuously spammed by messages like these: > > [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus timeout). > [12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the rep. > [12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received 10 > [12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750 > [12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state. > [12880] cl15-02 corosyncdebug [TOTEM ] got commit token > [12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state. > [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41: > [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47: > [12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41: > [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 > [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 > [12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47: > [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 > [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 > > [12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages in recovery. > [12880] cl15-02 corosyncdebug [TOTEM ] got commit token > [12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 > [12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state > [12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0 > [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 > Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) > Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) > [12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state. > [12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157) > [12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/27158) > [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2 > [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1 > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22) > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677 > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22) > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync vote quorum service v1.0 > [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 > [12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2 > [12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil), length = 56 > [12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service. > [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0 > [12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0 > [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge during join). > > > and we do not get them when there is only a single network interface in the systems. > > -------------------------------------------------------------------------------------- > These are the network configurations on the three hosts: > > [root at cl15-02 ~]# ifconfig | grep inet > inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > [root at cl15-08 ~]# ifconfig | grep inet > inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > [root at cl15-09 ~]# ifconfig | grep inet > inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > ----------------------------------------------------------------------------------- > corosync-quorumtool output: > > [root at cl15-02 ~]# corosync-quorumtool > Quorum information > ------------------ > Date: Mon Apr 11 15:46:26 2016 > Quorum provider: corosync_votequorum > Nodes: 3 > Node ID: 1 > Ring ID: 18952 > Quorate: Yes > > Votequorum information > ---------------------- > Expected votes: 3 > Highest expected: 3 > Total votes: 3 > Quorum: 2 > Flags: Quorate > > Membership information > ---------------------- > Nodeid Votes Name > 1 1 cl15-02 (local) > 2 1 cl15-08 > 3 1 cl15-09 > > --------------------------------------------------------------------------- > /etc/corosync/corosync.conf: > > [root at cl15-02 ~]# cat /etc/corosync/corosync.conf > totem { > version: 2 > secauth: off > cluster_name: gfs_cluster > transport: udpu > } > > nodelist { > node { > ring0_addr: cl15-02 > nodeid: 1 > } > > node { > ring0_addr: cl15-08 > nodeid: 2 > } > > node { > ring0_addr: cl15-09 > nodeid: 3 > } > } > > quorum { > provider: corosync_votequorum > } > > logging { > debug: on You have debug logging on. At a guess I would say that the config file with the other interface in it doesn't :) Chrissie > to_logfile: yes > logfile: /var/log/cluster/corosync.log > to_syslog: yes > } > From stefano.panella at citrix.com Tue Apr 12 14:02:02 2016 From: stefano.panella at citrix.com (Stefano Panella) Date: Tue, 12 Apr 2016 14:02:02 +0000 Subject: [Linux-cluster] Help with corosync and GFS2 on multi network setup In-Reply-To: <570CF7E1.3090309@redhat.com> References: , <570CF7E1.3090309@redhat.com> Message-ID: <1460469693577.36082@citrix.com> Hi Christine, thanks for your input. I have checked and in the configuration with only one network I have debugging turned on as well (same corosync.conf files). These messages are repeating every 1-2 seconds and the reason why I think there is something wrong is that if I do operation on a sqlite3 db on the GFS2 filesystem the operations are much slower when I have the secondary network as well (and the extra logging) If I try to strace the sqlite3 command, it is stuck for few seconds (very similar to the period of the logging repeating) in a fcntl system call needed to lock the db file ________________________________________ From: linux-cluster-bounces at redhat.com on behalf of Christine Caulfield Sent: Tuesday, April 12, 2016 2:28 PM To: linux-cluster at redhat.com Subject: Re: [Linux-cluster] Help with corosync and GFS2 on multi network setup On 12/04/16 13:45, Stefano Panella wrote: > Hi everybody, > > we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far! > > We now have a new set-up with two network interfaces for every host in the cluster: > A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X) > B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X) > > when we run corosync in this mode we get the logs continuously spammed by messages like these: > > [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus timeout). > [12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the rep. > [12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received 10 > [12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750 > [12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state. > [12880] cl15-02 corosyncdebug [TOTEM ] got commit token > [12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state. > [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41: > [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47: > [12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41: > [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 > [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 > [12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47: > [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 > [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 > > [12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages in recovery. > [12880] cl15-02 corosyncdebug [TOTEM ] got commit token > [12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 > [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 > [12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 > [12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state > [12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0 > [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 > Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) > Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) > [12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state. > [12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157) > [12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/27158) > [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2 > [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) > [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1 > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22) > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677 > Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22) > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874 > [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No > [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 > [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 > [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync vote quorum service v1.0 > [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 > [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 > [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 > [12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2 > [12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil), length = 56 > [12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service. > [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0 > [12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0 > [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge during join). > > > and we do not get them when there is only a single network interface in the systems. > > -------------------------------------------------------------------------------------- > These are the network configurations on the three hosts: > > [root at cl15-02 ~]# ifconfig | grep inet > inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > [root at cl15-08 ~]# ifconfig | grep inet > inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > [root at cl15-09 ~]# ifconfig | grep inet > inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255 > inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255 > inet 127.0.0.1 netmask 255.0.0.0 > > ----------------------------------------------------------------------------------- > corosync-quorumtool output: > > [root at cl15-02 ~]# corosync-quorumtool > Quorum information > ------------------ > Date: Mon Apr 11 15:46:26 2016 > Quorum provider: corosync_votequorum > Nodes: 3 > Node ID: 1 > Ring ID: 18952 > Quorate: Yes > > Votequorum information > ---------------------- > Expected votes: 3 > Highest expected: 3 > Total votes: 3 > Quorum: 2 > Flags: Quorate > > Membership information > ---------------------- > Nodeid Votes Name > 1 1 cl15-02 (local) > 2 1 cl15-08 > 3 1 cl15-09 > > --------------------------------------------------------------------------- > /etc/corosync/corosync.conf: > > [root at cl15-02 ~]# cat /etc/corosync/corosync.conf > totem { > version: 2 > secauth: off > cluster_name: gfs_cluster > transport: udpu > } > > nodelist { > node { > ring0_addr: cl15-02 > nodeid: 1 > } > > node { > ring0_addr: cl15-08 > nodeid: 2 > } > > node { > ring0_addr: cl15-09 > nodeid: 3 > } > } > > quorum { > provider: corosync_votequorum > } > > logging { > debug: on You have debug logging on. At a guess I would say that the config file with the other interface in it doesn't :) Chrissie > to_logfile: yes > logfile: /var/log/cluster/corosync.log > to_syslog: yes > } > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From ccaulfie at redhat.com Tue Apr 12 15:53:08 2016 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 12 Apr 2016 16:53:08 +0100 Subject: [Linux-cluster] Help with corosync and GFS2 on multi network setup In-Reply-To: <1460469693577.36082@citrix.com> References: <570CF7E1.3090309@redhat.com> <1460469693577.36082@citrix.com> Message-ID: <570D19E4.5090203@redhat.com> On 12/04/16 15:02, Stefano Panella wrote: > Hi Christine, > > thanks for your input. I have checked and in the configuration with only one network I have debugging turned on as well (same corosync.conf files). > > These messages are repeating every 1-2 seconds and the reason why I think there is something wrong is that if I do operation on a sqlite3 db on the GFS2 filesystem the operations are much slower when I have the secondary network as well (and the extra logging) > The messages are just debugging messages - they are not indicative of any problem. If anything they show that everything is fine - with corosync at least. They will slow things down a little though. Chrissie > If I try to strace the sqlite3 command, it is stuck for few seconds (very similar to the period of the logging repeating) in a fcntl system call needed to lock the db file > ________________________________________ > From: linux-cluster-bounces at redhat.com on behalf of Christine Caulfield > Sent: Tuesday, April 12, 2016 2:28 PM > To: linux-cluster at redhat.com > Subject: Re: [Linux-cluster] Help with corosync and GFS2 on multi network setup > > On 12/04/16 13:45, Stefano Panella wrote: >> Hi everybody, >> >> we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far! >> >> We now have a new set-up with two network interfaces for every host in the cluster: >> A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X) >> B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X) >> >> when we run corosync in this mode we get the logs continuously spammed by messages like these: >> >> [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 0(consensus timeout). >> [12880] cl15-02 corosyncdebug [TOTEM ] Creating commit token because I am the rep. >> [12880] cl15-02 corosyncdebug [TOTEM ] Saving state aru 10 high seq received 10 >> [12880] cl15-02 corosyncdebug [MAIN ] Storing new sequence id for ring 5750 >> [12880] cl15-02 corosyncdebug [TOTEM ] entering COMMIT state. >> [12880] cl15-02 corosyncdebug [TOTEM ] got commit token >> [12880] cl15-02 corosyncdebug [TOTEM ] entering RECOVERY state. >> [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [0] member 10.220.88.41: >> [12880] cl15-02 corosyncdebug [TOTEM ] TRANS [1] member 10.220.88.47: >> [12880] cl15-02 corosyncdebug [TOTEM ] position [0] member 10.220.88.41: >> [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 >> [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 >> [12880] cl15-02 corosyncdebug [TOTEM ] position [1] member 10.220.88.47: >> [12880] cl15-02 corosyncdebug [TOTEM ] previous ring seq 574c rep 10.220.88.41 >> [12880] cl15-02 corosyncdebug [TOTEM ] aru 10 high delivered 10 received flag 1 >> >> [12880] cl15-02 corosyncdebug [TOTEM ] Did not need to originate any messages in recovery. >> [12880] cl15-02 corosyncdebug [TOTEM ] got commit token >> [12880] cl15-02 corosyncdebug [TOTEM ] Sending initial ORF token >> [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] install seq 0 aru 0 high seq received 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0 >> [12880] cl15-02 corosyncdebug [TOTEM ] Resetting old ring state >> [12880] cl15-02 corosyncdebug [TOTEM ] recovery to regular 1-0 >> [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 1 >> Apr 11 16:19:54 [13372] cl15-02 pacemakerd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) >> Apr 11 16:19:54 [13378] cl15-02 crmd: info: pcmk_quorum_notification: Membership 22352: quorum retained (2) >> [12880] cl15-02 corosyncdebug [TOTEM ] entering OPERATIONAL state. >> [12880] cl15-02 corosyncnotice [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members >> [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync configuration map access >> Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157) >> [12880] cl15-02 corosyncdebug [CMAP ] Not first sync -> no action >> Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/27158) >> [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x2 >> [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) >> [12880] cl15-02 corosyncdebug [CPG ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0) >> [12880] cl15-02 corosyncdebug [CPG ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0) >> [12880] cl15-02 corosyncdebug [CPG ] got joinlist message from node 0x1 >> [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync cluster closed process group service v1.01 >> Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22) >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677 >> Apr 11 16:19:54 [13373] cl15-02 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22) >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874 >> [12880] cl15-02 corosyncdebug [CPG ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873 >> [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No >> [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No >> [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 >> [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 >> [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No >> [12880] cl15-02 corosyncdebug [VOTEQ ] got nodeinfo message from cluster node 2 >> [12880] cl15-02 corosyncdebug [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0 >> [12880] cl15-02 corosyncdebug [SYNC ] Committing synchronization for corosync vote quorum service v1.0 >> [12880] cl15-02 corosyncdebug [VOTEQ ] total_votes=2, expected_votes=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 1 state=1, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 2 state=1, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] node 3 state=2, votes=1, expected=3 >> [12880] cl15-02 corosyncdebug [VOTEQ ] lowest node id: 1 us: 1 >> [12880] cl15-02 corosyncdebug [VOTEQ ] highest node id: 2 us: 1 >> [12880] cl15-02 corosyncnotice [QUORUM] Members[2]: 1 2 >> [12880] cl15-02 corosyncdebug [QUORUM] sending quorum notification to (nil), length = 56 >> [12880] cl15-02 corosyncnotice [MAIN ] Completed service synchronization, ready to provide service. >> [12880] cl15-02 corosyncdebug [TOTEM ] waiting_trans_ack changed to 0 >> [12880] cl15-02 corosyncdebug [QUORUM] got quorate request on 0x7f5a907749a0 >> [12880] cl15-02 corosyncdebug [TOTEM ] entering GATHER state from 11(merge during join). >> >> >> and we do not get them when there is only a single network interface in the systems. >> >> -------------------------------------------------------------------------------------- >> These are the network configurations on the three hosts: >> >> [root at cl15-02 ~]# ifconfig | grep inet >> inet 10.220.88.41 netmask 255.255.248.0 broadcast 10.220.95.255 >> inet 10.220.246.50 netmask 255.255.255.0 broadcast 10.220.246.255 >> inet 127.0.0.1 netmask 255.0.0.0 >> >> [root at cl15-08 ~]# ifconfig | grep inet >> inet 10.220.88.47 netmask 255.255.248.0 broadcast 10.220.95.255 >> inet 10.220.246.51 netmask 255.255.255.0 broadcast 10.220.246.255 >> inet 127.0.0.1 netmask 255.0.0.0 >> >> [root at cl15-09 ~]# ifconfig | grep inet >> inet 10.220.88.48 netmask 255.255.248.0 broadcast 10.220.95.255 >> inet 10.220.246.59 netmask 255.255.255.0 broadcast 10.220.246.255 >> inet 127.0.0.1 netmask 255.0.0.0 >> >> ----------------------------------------------------------------------------------- >> corosync-quorumtool output: >> >> [root at cl15-02 ~]# corosync-quorumtool >> Quorum information >> ------------------ >> Date: Mon Apr 11 15:46:26 2016 >> Quorum provider: corosync_votequorum >> Nodes: 3 >> Node ID: 1 >> Ring ID: 18952 >> Quorate: Yes >> >> Votequorum information >> ---------------------- >> Expected votes: 3 >> Highest expected: 3 >> Total votes: 3 >> Quorum: 2 >> Flags: Quorate >> >> Membership information >> ---------------------- >> Nodeid Votes Name >> 1 1 cl15-02 (local) >> 2 1 cl15-08 >> 3 1 cl15-09 >> >> --------------------------------------------------------------------------- >> /etc/corosync/corosync.conf: >> >> [root at cl15-02 ~]# cat /etc/corosync/corosync.conf >> totem { >> version: 2 >> secauth: off >> cluster_name: gfs_cluster >> transport: udpu >> } >> >> nodelist { >> node { >> ring0_addr: cl15-02 >> nodeid: 1 >> } >> >> node { >> ring0_addr: cl15-08 >> nodeid: 2 >> } >> >> node { >> ring0_addr: cl15-09 >> nodeid: 3 >> } >> } >> >> quorum { >> provider: corosync_votequorum >> } >> >> logging { >> debug: on > > > You have debug logging on. At a guess I would say that the config file > with the other interface in it doesn't :) > > Chrissie > > >> to_logfile: yes >> logfile: /var/log/cluster/corosync.log >> to_syslog: yes >> } >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jonathan.davies at citrix.com Fri Apr 15 14:55:02 2016 From: jonathan.davies at citrix.com (Jonathan Davies) Date: Fri, 15 Apr 2016 15:55:02 +0100 Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss Message-ID: <571100C6.1050606@citrix.com> Dear linux-cluster, I have made some observations about the behaviour of gfs2 and would appreciate confirmation of whether this is expected behaviour or something has gone wrong. I have a three-node cluster -- let's call the nodes A, B and C. On each of nodes A and B, I have a loop that repeatedly writes an increasing integer value to a file in the GFS2-mountpoint. On node C, I have a loop that reads from both these files from the GFS2-mountpoint. The reads on node C show the latest values written by A and B, and stay up-to-date. All good so far. I then cause node A to drop the corosync heartbeat by executing the following on node A: iptables -I INPUT -p udp --dport 5404 -j DROP iptables -I INPUT -p udp --dport 5405 -j DROP iptables -I INPUT -p tcp --dport 21064 -j DROP After a few seconds, I normally observe that all I/O to the GFS2 filesystem hangs forever on node A: the latest value read by node C is the same as the last successful write by node A. This is exactly the behaviour I want -- I want to be sure that node A never completes I/O that is not able to be seen by other nodes. However, on some occasions, I observe that node A continues in the loop believing that it is successfully writing to the file but, according to node C, the file stops being updated. (Meanwhile, the file written by node B continues to be up-to-date as read by C.) This is concerning -- it looks like I/O writes are being completed on node A even though other nodes in the cluster cannot see the results. I performed this test 20 times, rebooting node A between each, and saw the "I/O hanging" behaviour 16 times and the "I/O appears to continue" behaviour 4 times. I couldn't see anything that might cause it to sometimes adopt one behaviour and sometimes the other. So... is this expected? Should I be able to rely upon I/O hanging? Or have I misconfigured something? Advice would be appreciated. Thanks, Jonathan Notes: * The I/O from node A uses an fd that is O_DIRECT|O_SYNC, so the page cache is not involved. * Versions: corosync 2.3.4, dlm_controld 4.0.2, gfs2 as per RHEL 7.2. * I don't see anything particularly useful being logged. Soon after I insert the iptables rules on node A, I see the following on node A: 2016-04-15T14:15:45.608175+00:00 localhost corosync[3074]: [TOTEM ] The token was lost in the OPERATIONAL state. 2016-04-15T14:15:45.608191+00:00 localhost corosync[3074]: [TOTEM ] A processor failed, forming new configuration. 2016-04-15T14:15:45.608198+00:00 localhost corosync[3074]: [TOTEM ] entering GATHER state from 2(The token was lost in the OPERATIONAL state.). Around the time node C sees the output from node A stop changing, node A reports: 2016-04-15T14:15:58.388404+00:00 localhost corosync[3074]: [TOTEM ] entering GATHER state from 0(consensus timeout). * corosync.conf: totem { version: 2 secauth: off cluster_name: 1498d523 transport: udpu token_retransmits_before_loss_const: 10 token: 10000 } logging { debug: on } quorum { provider: corosync_votequorum } nodelist { node { ring0_addr: 10.220.73.6 } node { ring0_addr: 10.220.73.7 } node { ring0_addr: 10.220.73.3 } } From rpeterso at redhat.com Fri Apr 15 15:14:28 2016 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 15 Apr 2016 11:14:28 -0400 (EDT) Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss In-Reply-To: <571100C6.1050606@citrix.com> References: <571100C6.1050606@citrix.com> Message-ID: <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Dear linux-cluster, > > I have made some observations about the behaviour of gfs2 and would > appreciate confirmation of whether this is expected behaviour or > something has gone wrong. > > I have a three-node cluster -- let's call the nodes A, B and C. On each > of nodes A and B, I have a loop that repeatedly writes an increasing > integer value to a file in the GFS2-mountpoint. On node C, I have a loop > that reads from both these files from the GFS2-mountpoint. The reads on > node C show the latest values written by A and B, and stay up-to-date. > All good so far. > > I then cause node A to drop the corosync heartbeat by executing the > following on node A: > > iptables -I INPUT -p udp --dport 5404 -j DROP > iptables -I INPUT -p udp --dport 5405 -j DROP > iptables -I INPUT -p tcp --dport 21064 -j DROP > > After a few seconds, I normally observe that all I/O to the GFS2 > filesystem hangs forever on node A: the latest value read by node C is > the same as the last successful write by node A. This is exactly the > behaviour I want -- I want to be sure that node A never completes I/O > that is not able to be seen by other nodes. > > However, on some occasions, I observe that node A continues in the loop > believing that it is successfully writing to the file but, according to > node C, the file stops being updated. (Meanwhile, the file written by > node B continues to be up-to-date as read by C.) This is concerning -- > it looks like I/O writes are being completed on node A even though other > nodes in the cluster cannot see the results. > > I performed this test 20 times, rebooting node A between each, and saw > the "I/O hanging" behaviour 16 times and the "I/O appears to continue" > behaviour 4 times. I couldn't see anything that might cause it to > sometimes adopt one behaviour and sometimes the other. > > So... is this expected? Should I be able to rely upon I/O hanging? Or > have I misconfigured something? Advice would be appreciated. > > Thanks, > Jonathan Hi Jonathan, This seems like expected behavior to me. It probably all goes back to whatever node "masters" the glock and the node that "owns" the glock, when communications are lost. In your test, the DLM lock is being traded back and forth between the file's writer on A and the file's reader on C. Then communication to the DLM is blocked. When that happens, if the reader (C) happens to own the DLM lock when it loses DLM communications, the writer will block on DLM, and can't write a new value. The reader owns the lock, so it keeps reading the same value over and over. However, if A happens to own the DLM lock, it does not need to ask DLM's permission because it owns the lock. Therefore, it goes on writing. Meanwhile, the other node can't get DLM's permission to get the lock back, so it hangs. There's also the problem of the DLM lock "master" which presents another level of complexity to the mix, but let's not go into that now. Suffice it to say I think it's working as expected. Regards, Bob Peterson Red Hat File Systems From teigland at redhat.com Fri Apr 15 16:14:37 2016 From: teigland at redhat.com (David Teigland) Date: Fri, 15 Apr 2016 11:14:37 -0500 Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss In-Reply-To: <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com> References: <571100C6.1050606@citrix.com> <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com> Message-ID: <20160415161437.GB10934@redhat.com> > > However, on some occasions, I observe that node A continues in the loop > > believing that it is successfully writing to the file node A has the exclusive lock, so it continues writing... > > but, according to > > node C, the file stops being updated. (Meanwhile, the file written by > > node B continues to be up-to-date as read by C.) This is concerning -- > > it looks like I/O writes are being completed on node A even though other > > nodes in the cluster cannot see the results. Is node C blocked trying to read the file A is writing? That what we'd expect until recovery has removed node A. Or are C's reads completing while A continues writing the file? That would not be correct. > However, if A happens to own the DLM lock, it does not need > to ask DLM's permission because it owns the lock. Therefore, it goes > on writing. Meanwhile, the other node can't get DLM's permission to > get the lock back, so it hangs. The description sounds like C might not be hanging in read as we'd expect while A continues writing. If that's the case, then it implies that dlm recovery has been completed by nodes B and C (removing A), which allows the lock to be granted to C for reading. If dlm recovery on B/C has completed, it means that A should have been fenced, so A should not be able to write once C is given the lock. Dave From jonathan.davies at citrix.com Mon Apr 18 13:12:58 2016 From: jonathan.davies at citrix.com (Jonathan Davies) Date: Mon, 18 Apr 2016 14:12:58 +0100 Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss In-Reply-To: <20160415161437.GB10934@redhat.com> References: <571100C6.1050606@citrix.com> <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com> <20160415161437.GB10934@redhat.com> Message-ID: <5714DD5A.2040400@citrix.com> On 15/04/16 17:14, David Teigland wrote: >>> However, on some occasions, I observe that node A continues in the loop >>> believing that it is successfully writing to the file > > node A has the exclusive lock, so it continues writing... > >>> but, according to >>> node C, the file stops being updated. (Meanwhile, the file written by >>> node B continues to be up-to-date as read by C.) This is concerning -- >>> it looks like I/O writes are being completed on node A even though other >>> nodes in the cluster cannot see the results. > > Is node C blocked trying to read the file A is writing? That what we'd > expect until recovery has removed node A. Or are C's reads completing > while A continues writing the file? That would not be correct. > >> However, if A happens to own the DLM lock, it does not need >> to ask DLM's permission because it owns the lock. Therefore, it goes >> on writing. Meanwhile, the other node can't get DLM's permission to >> get the lock back, so it hangs. > > The description sounds like C might not be hanging in read as we'd expect > while A continues writing. If that's the case, then it implies that dlm > recovery has been completed by nodes B and C (removing A), which allows > the lock to be granted to C for reading. If dlm recovery on B/C has > completed, it means that A should have been fenced, so A should not be > able to write once C is given the lock. Thanks Bob and Dave for your very helpful insights. Your line of reasoning led me to realise that I am running dlm with fencing disabled, which explains everything. Node C was not hanging in read while A continued to write; it was constantly returning an old value. I presume that's legitimate as C believes the value it saw last must still be up-to-date because A must have been fenced so couldn't have updated it. (It also explains why I didn't see anything useful in the logs.) When I run the same test with fencing enabled then, although A continues writing after the failure, the read on C hangs until A is fenced, at which point it is able to read the last value A wrote. That's exactly what I want. Apologies for the noise, and thanks for the explanations. Jonathan