From daniel.dehennin at baby-gnu.org  Fri Apr  8 09:21:08 2016
From: daniel.dehennin at baby-gnu.org (Daniel Dehennin)
Date: Fri, 08 Apr 2016 11:21:08 +0200
Subject: [Linux-cluster] GFS2: debugging I/O issues
Message-ID: <87h9fc78xn.fsf@hati.baby-gnu.org>

Hello,

On our virtualisation infrastructure we have a 4To GFS2 over a SAN.

Since one or two weeks we are facing read I/O issues, 5k or 6k IOPS with
an average block size of 5kB.

I'm looking for the possibilities and didn't find anything yet, so my
question: 

    Is it possible that reaching over 80% use of the GFS2 can produce
    such workload?

Regards.

-- 
Daniel Dehennin
R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 342 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160408/58fadd78/attachment.sig>

From swhiteho at redhat.com  Fri Apr  8 09:28:56 2016
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 8 Apr 2016 10:28:56 +0100
Subject: [Linux-cluster] GFS2: debugging I/O issues
In-Reply-To: <87h9fc78xn.fsf@hati.baby-gnu.org>
References: <87h9fc78xn.fsf@hati.baby-gnu.org>
Message-ID: <570779D8.9070709@redhat.com>

Hi,

On 08/04/16 10:21, Daniel Dehennin wrote:
> Hello,
>
> On our virtualisation infrastructure we have a 4To GFS2 over a SAN.
>
> Since one or two weeks we are facing read I/O issues, 5k or 6k IOPS with
> an average block size of 5kB.
>
> I'm looking for the possibilities and didn't find anything yet, so my
> question:
>
>      Is it possible that reaching over 80% use of the GFS2 can produce
>      such workload?
>
> Regards.
>
>
>

If you are worried about read I/O, then I'd look carefully at the 
fragmentation using filefrag on a few representative files to see how 
they are laid out on disk. There are other possible causes of 
performance issues too - do you have the fs mounted noatime (which we 
recommend for most use cases) for example?

Running a filesystem which is close to the capacity limit can generate 
fragmentation over time, 80% would usually be ok, and more recent 
versions of GFS2 are better than older ones at avoiding fragmentation in 
such circumstances,

Steve.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160408/8b5699df/attachment.htm>

From daniel.dehennin at baby-gnu.org  Fri Apr  8 11:13:02 2016
From: daniel.dehennin at baby-gnu.org (Daniel Dehennin)
Date: Fri, 08 Apr 2016 13:13:02 +0200
Subject: [Linux-cluster] GFS2: debugging I/O issues
In-Reply-To: <570779D8.9070709@redhat.com> (Steven Whitehouse's message of
	"Fri, 8 Apr 2016 10:28:56 +0100")
References: <87h9fc78xn.fsf@hati.baby-gnu.org> <570779D8.9070709@redhat.com>
Message-ID: <87d1q073r5.fsf@hati.baby-gnu.org>

Steven Whitehouse <swhiteho at redhat.com> writes:

> If you are worried about read I/O, then I'd look carefully at the
> fragmentation using filefrag on a few representative files to see how
> they are laid out on disk.

A running qcow2 image, using a backing file:

- running qcow2 is 822MB with 3002 extents

- backing file is 2.2GB with 2893 extents

Another saved images, used as read-only backing file for running VMs is
7.5GB with 9640 extents.

> There are other possible causes of
> performance issues too - do you have the fs mounted noatime (which we
> recommend for most use cases) for example?

Right, I missed that one, I need to planify a down time.

> Running a filesystem which is close to the capacity limit can generate
> fragmentation over time, 80% would usually be ok, and more recent
> versions of GFS2 are better than older ones at avoiding fragmentation
> in such circumstances,

It's running on Ubuntu Trusty a 3.13 kernel and gfs2-utils 3.1.6.

Thanks.
-- 
Daniel Dehennin
R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 342 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160408/66e4e1c6/attachment.sig>

From daniel.dehennin at baby-gnu.org  Mon Apr 11 12:29:14 2016
From: daniel.dehennin at baby-gnu.org (Daniel Dehennin)
Date: Mon, 11 Apr 2016 14:29:14 +0200
Subject: [Linux-cluster] GFS2 and LVM stripes
Message-ID: <8737qs72hx.fsf@hati.baby-gnu.org>

Hello,

My OpenNebula cluster has a 4TB GFS2 logical volume supported by two
physical volumes (2TB each).

The result is that near all I/O go to a single PV.

Now I'm looking at a way to convert linear LV to a stripping one and
only found the possibility to go with a mirror[1].

Do you have any advice on the use of GFS2 over stipped LVM?

Regards.

Footnotes: 
[1]  http://community.hpe.com/t5/System-Administration/Need-to-move-the-data-from-Linear-LV-to-stripped-LV-on-RHEL-5-7/td-p/6134323

-- 
Daniel Dehennin
R?cup?rer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 342 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160411/fd547361/attachment.sig>

From swhiteho at redhat.com  Mon Apr 11 12:52:00 2016
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 11 Apr 2016 13:52:00 +0100
Subject: [Linux-cluster] GFS2 and LVM stripes
In-Reply-To: <8737qs72hx.fsf@hati.baby-gnu.org>
References: <8737qs72hx.fsf@hati.baby-gnu.org>
Message-ID: <570B9DF0.9000005@redhat.com>

Hi,

On 11/04/16 13:29, Daniel Dehennin wrote:
> Hello,
>
> My OpenNebula cluster has a 4TB GFS2 logical volume supported by two
> physical volumes (2TB each).
>
> The result is that near all I/O go to a single PV.
>
> Now I'm looking at a way to convert linear LV to a stripping one and
> only found the possibility to go with a mirror[1].
>
> Do you have any advice on the use of GFS2 over stipped LVM?
>
> Regards.
>
> Footnotes:
> [1]  http://community.hpe.com/t5/System-Administration/Need-to-move-the-data-from-Linear-LV-to-stripped-LV-on-RHEL-5-7/td-p/6134323
>
>
>

It will depend on the workload as to what is the best stripe size to 
choose, so you might want to try some different sizes to see what will 
work best in your case,

Steve.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20160411/67c3062e/attachment.htm>

From stefano.panella at citrix.com  Tue Apr 12 12:45:21 2016
From: stefano.panella at citrix.com (Stefano Panella)
Date: Tue, 12 Apr 2016 12:45:21 +0000
Subject: [Linux-cluster] Help with corosync and GFS2 on multi network setup
Message-ID: <c45afa96d650449f8256462f05c60877@AMSPEX02CL03.citrite.net>

Hi everybody,

we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!

We now have a new set-up with two network interfaces for every host in the cluster:
A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)

when we run corosync in this mode we get the logs continuously spammed by messages like these:

[12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 0(consensus timeout).
[12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am the rep.
[12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq received 10
[12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
[12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
[12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
[12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
[12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
[12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
[12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
[12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1

[12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any messages in recovery.
[12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
[12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
[12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
[12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
[12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
[12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
[12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
Apr 11 16:19:54 [13378] cl15-02       crmd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
[12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
[12880] cl15-02 corosyncnotice  [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync configuration map access
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157)
[12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section status to master (origin=local/crmd/27158)
[12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
[12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
[12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync cluster closed process group service v1.01
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22)
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677
Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22)
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874
[12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
[12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
[12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
[12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync vote quorum service v1.0
[12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
[12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
[12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
[12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
[12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to (nil), length = 56
[12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
[12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
[12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
[12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge during join).


and we do not get them when there is only a single network interface in the systems.

--------------------------------------------------------------------------------------
These are the network configurations on the three hosts:

[root at cl15-02 ~]# ifconfig | grep inet
        inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

[root at cl15-08 ~]# ifconfig | grep inet
        inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

[root at cl15-09 ~]# ifconfig | grep inet
        inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
        inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
        inet 127.0.0.1  netmask 255.0.0.0

-----------------------------------------------------------------------------------
corosync-quorumtool output:

[root at cl15-02 ~]# corosync-quorumtool
Quorum information
------------------
Date:             Mon Apr 11 15:46:26 2016
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          1
Ring ID:          18952
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
         1          1 cl15-02 (local)
         2          1 cl15-08
         3          1 cl15-09

---------------------------------------------------------------------------
/etc/corosync/corosync.conf:

[root at cl15-02 ~]# cat /etc/corosync/corosync.conf
totem {
    version: 2
    secauth: off
    cluster_name: gfs_cluster
    transport: udpu
}

nodelist {
    node {
        ring0_addr: cl15-02
        nodeid: 1
    }

    node {
        ring0_addr: cl15-08
        nodeid: 2
    }

    node {
        ring0_addr: cl15-09
        nodeid: 3
    }
}

quorum {
    provider: corosync_votequorum
}

logging {
    debug: on
    to_logfile: yes
    logfile: /var/log/cluster/corosync.log
    to_syslog: yes
}



From ccaulfie at redhat.com  Tue Apr 12 13:28:01 2016
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 12 Apr 2016 14:28:01 +0100
Subject: [Linux-cluster] Help with corosync and GFS2 on multi network
 setup
In-Reply-To: <c45afa96d650449f8256462f05c60877@AMSPEX02CL03.citrite.net>
References: <c45afa96d650449f8256462f05c60877@AMSPEX02CL03.citrite.net>
Message-ID: <570CF7E1.3090309@redhat.com>

On 12/04/16 13:45, Stefano Panella wrote:
> Hi everybody,
> 
> we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!
> 
> We now have a new set-up with two network interfaces for every host in the cluster:
> A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
> B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)
> 
> when we run corosync in this mode we get the logs continuously spammed by messages like these:
> 
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 0(consensus timeout).
> [12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am the rep.
> [12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq received 10
> [12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
> 
> [12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any messages in recovery.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
> [12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
> Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
> Apr 11 16:19:54 [13378] cl15-02       crmd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
> [12880] cl15-02 corosyncnotice  [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync configuration map access
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157)
> [12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section status to master (origin=local/crmd/27158)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync cluster closed process group service v1.01
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync vote quorum service v1.0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
> [12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to (nil), length = 56
> [12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
> [12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge during join).
> 
> 
> and we do not get them when there is only a single network interface in the systems.
> 
> --------------------------------------------------------------------------------------
> These are the network configurations on the three hosts:
> 
> [root at cl15-02 ~]# ifconfig | grep inet
>         inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> [root at cl15-08 ~]# ifconfig | grep inet
>         inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> [root at cl15-09 ~]# ifconfig | grep inet
>         inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
> 
> -----------------------------------------------------------------------------------
> corosync-quorumtool output:
> 
> [root at cl15-02 ~]# corosync-quorumtool
> Quorum information
> ------------------
> Date:             Mon Apr 11 15:46:26 2016
> Quorum provider:  corosync_votequorum
> Nodes:            3
> Node ID:          1
> Ring ID:          18952
> Quorate:          Yes
> 
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      3
> Quorum:           2
> Flags:            Quorate
> 
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          1          1 cl15-02 (local)
>          2          1 cl15-08
>          3          1 cl15-09
> 
> ---------------------------------------------------------------------------
> /etc/corosync/corosync.conf:
> 
> [root at cl15-02 ~]# cat /etc/corosync/corosync.conf
> totem {
>     version: 2
>     secauth: off
>     cluster_name: gfs_cluster
>     transport: udpu
> }
> 
> nodelist {
>     node {
>         ring0_addr: cl15-02
>         nodeid: 1
>     }
> 
>     node {
>         ring0_addr: cl15-08
>         nodeid: 2
>     }
> 
>     node {
>         ring0_addr: cl15-09
>         nodeid: 3
>     }
> }
> 
> quorum {
>     provider: corosync_votequorum
> }
> 
> logging {
>     debug: on


You have debug logging on. At a guess I would say that the config file
with the other interface in it doesn't :)

Chrissie


>     to_logfile: yes
>     logfile: /var/log/cluster/corosync.log
>     to_syslog: yes
> }
> 



From stefano.panella at citrix.com  Tue Apr 12 14:02:02 2016
From: stefano.panella at citrix.com (Stefano Panella)
Date: Tue, 12 Apr 2016 14:02:02 +0000
Subject: [Linux-cluster] Help with corosync and GFS2 on multi network
 setup
In-Reply-To: <570CF7E1.3090309@redhat.com>
References: <c45afa96d650449f8256462f05c60877@AMSPEX02CL03.citrite.net>,
	<570CF7E1.3090309@redhat.com>
Message-ID: <1460469693577.36082@citrix.com>

Hi Christine,

thanks for your input. I have checked and in the configuration with only one network I have debugging turned on as well (same corosync.conf files).

These messages are repeating every 1-2 seconds and the reason why I think there is something wrong is that if I do operation on a sqlite3 db on the GFS2 filesystem the operations are much slower when I have the secondary network as well (and the extra logging)

If I try to strace the sqlite3 command, it is stuck for few seconds (very similar to the period of the logging repeating) in a fcntl system call needed to lock the db file
________________________________________
From: linux-cluster-bounces at redhat.com <linux-cluster-bounces at redhat.com> on behalf of Christine Caulfield <ccaulfie at redhat.com>
Sent: Tuesday, April 12, 2016 2:28 PM
To: linux-cluster at redhat.com
Subject: Re: [Linux-cluster] Help with corosync and GFS2 on multi network setup

On 12/04/16 13:45, Stefano Panella wrote:
> Hi everybody,
>
> we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!
>
> We now have a new set-up with two network interfaces for every host in the cluster:
> A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
> B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)
>
> when we run corosync in this mode we get the logs continuously spammed by messages like these:
>
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 0(consensus timeout).
> [12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am the rep.
> [12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq received 10
> [12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
> [12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
>
> [12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any messages in recovery.
> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
> [12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
> [12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
> [12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
> Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
> Apr 11 16:19:54 [13378] cl15-02       crmd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
> [12880] cl15-02 corosyncnotice  [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync configuration map access
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157)
> [12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section status to master (origin=local/crmd/27158)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync cluster closed process group service v1.01
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677
> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22)
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874
> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync vote quorum service v1.0
> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
> [12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
> [12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to (nil), length = 56
> [12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
> [12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge during join).
>
>
> and we do not get them when there is only a single network interface in the systems.
>
> --------------------------------------------------------------------------------------
> These are the network configurations on the three hosts:
>
> [root at cl15-02 ~]# ifconfig | grep inet
>         inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
>
> [root at cl15-08 ~]# ifconfig | grep inet
>         inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
>
> [root at cl15-09 ~]# ifconfig | grep inet
>         inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
>         inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
>         inet 127.0.0.1  netmask 255.0.0.0
>
> -----------------------------------------------------------------------------------
> corosync-quorumtool output:
>
> [root at cl15-02 ~]# corosync-quorumtool
> Quorum information
> ------------------
> Date:             Mon Apr 11 15:46:26 2016
> Quorum provider:  corosync_votequorum
> Nodes:            3
> Node ID:          1
> Ring ID:          18952
> Quorate:          Yes
>
> Votequorum information
> ----------------------
> Expected votes:   3
> Highest expected: 3
> Total votes:      3
> Quorum:           2
> Flags:            Quorate
>
> Membership information
> ----------------------
>     Nodeid      Votes Name
>          1          1 cl15-02 (local)
>          2          1 cl15-08
>          3          1 cl15-09
>
> ---------------------------------------------------------------------------
> /etc/corosync/corosync.conf:
>
> [root at cl15-02 ~]# cat /etc/corosync/corosync.conf
> totem {
>     version: 2
>     secauth: off
>     cluster_name: gfs_cluster
>     transport: udpu
> }
>
> nodelist {
>     node {
>         ring0_addr: cl15-02
>         nodeid: 1
>     }
>
>     node {
>         ring0_addr: cl15-08
>         nodeid: 2
>     }
>
>     node {
>         ring0_addr: cl15-09
>         nodeid: 3
>     }
> }
>
> quorum {
>     provider: corosync_votequorum
> }
>
> logging {
>     debug: on


You have debug logging on. At a guess I would say that the config file
with the other interface in it doesn't :)

Chrissie


>     to_logfile: yes
>     logfile: /var/log/cluster/corosync.log
>     to_syslog: yes
> }
>

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From ccaulfie at redhat.com  Tue Apr 12 15:53:08 2016
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 12 Apr 2016 16:53:08 +0100
Subject: [Linux-cluster] Help with corosync and GFS2 on multi network
 setup
In-Reply-To: <1460469693577.36082@citrix.com>
References: <c45afa96d650449f8256462f05c60877@AMSPEX02CL03.citrite.net>
	<570CF7E1.3090309@redhat.com> <1460469693577.36082@citrix.com>
Message-ID: <570D19E4.5090203@redhat.com>

On 12/04/16 15:02, Stefano Panella wrote:
> Hi Christine,
> 
> thanks for your input. I have checked and in the configuration with only one network I have debugging turned on as well (same corosync.conf files).
> 
> These messages are repeating every 1-2 seconds and the reason why I think there is something wrong is that if I do operation on a sqlite3 db on the GFS2 filesystem the operations are much slower when I have the secondary network as well (and the extra logging)
> 

The messages are just debugging messages - they are not indicative of
any problem. If anything they show that everything is fine - with
corosync at least. They will slow things down a little though.

Chrissie

> If I try to strace the sqlite3 command, it is stuck for few seconds (very similar to the period of the logging repeating) in a fcntl system call needed to lock the db file
> ________________________________________
> From: linux-cluster-bounces at redhat.com <linux-cluster-bounces at redhat.com> on behalf of Christine Caulfield <ccaulfie at redhat.com>
> Sent: Tuesday, April 12, 2016 2:28 PM
> To: linux-cluster at redhat.com
> Subject: Re: [Linux-cluster] Help with corosync and GFS2 on multi network setup
> 
> On 12/04/16 13:45, Stefano Panella wrote:
>> Hi everybody,
>>
>> we have been using corosync directly to provide clustering for GFS2 on our centos 7.2 pools with only one network interface and all has been working great so far!
>>
>> We now have a new set-up with two network interfaces for every host in the cluster:
>> A -> 1 Gbit (the one we would like corosync to use, 10.220.88.X)
>> B -> 10 Gbit (used for iscsi connection to storage, 10.220.246.X)
>>
>> when we run corosync in this mode we get the logs continuously spammed by messages like these:
>>
>> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 0(consensus timeout).
>> [12880] cl15-02 corosyncdebug   [TOTEM ] Creating commit token because I am the rep.
>> [12880] cl15-02 corosyncdebug   [TOTEM ] Saving state aru 10 high seq received 10
>> [12880] cl15-02 corosyncdebug   [MAIN  ] Storing new sequence id for ring 5750
>> [12880] cl15-02 corosyncdebug   [TOTEM ] entering COMMIT state.
>> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
>> [12880] cl15-02 corosyncdebug   [TOTEM ] entering RECOVERY state.
>> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [0] member 10.220.88.41:
>> [12880] cl15-02 corosyncdebug   [TOTEM ] TRANS [1] member 10.220.88.47:
>> [12880] cl15-02 corosyncdebug   [TOTEM ] position [0] member 10.220.88.41:
>> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
>> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
>> [12880] cl15-02 corosyncdebug   [TOTEM ] position [1] member 10.220.88.47:
>> [12880] cl15-02 corosyncdebug   [TOTEM ] previous ring seq 574c rep 10.220.88.41
>> [12880] cl15-02 corosyncdebug   [TOTEM ] aru 10 high delivered 10 received flag 1
>>
>> [12880] cl15-02 corosyncdebug   [TOTEM ] Did not need to originate any messages in recovery.
>> [12880] cl15-02 corosyncdebug   [TOTEM ] got commit token
>> [12880] cl15-02 corosyncdebug   [TOTEM ] Sending initial ORF token
>> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 0, aru 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 1, aru 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 2, aru 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] token retrans flag is 0 my set retrans flag0 retrans queue empty 1 count 3, aru 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] install seq 0 aru 0 high seq received 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] retrans flag count 4 token aru 0 install seq 0 aru 0 0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] Resetting old ring state
>> [12880] cl15-02 corosyncdebug   [TOTEM ] recovery to regular 1-0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 1
>> Apr 11 16:19:54 [13372] cl15-02 pacemakerd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
>> Apr 11 16:19:54 [13378] cl15-02       crmd:     info: pcmk_quorum_notification: Membership 22352: quorum retained (2)
>> [12880] cl15-02 corosyncdebug   [TOTEM ] entering OPERATIONAL state.
>> [12880] cl15-02 corosyncnotice  [TOTEM ] A new membership (10.220.88.41:22352) was formed. Members
>> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync configuration map access
>> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section nodes to master (origin=local/crmd/27157)
>> [12880] cl15-02 corosyncdebug   [CMAP  ] Not first sync -> no action
>> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Forwarding cib_modify operation for section status to master (origin=local/crmd/27158)
>> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x2
>> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
>> [12880] cl15-02 corosyncdebug   [CPG   ] comparing: sender r(0) ip(10.220.88.47) ; members(old:2 left:0)
>> [12880] cl15-02 corosyncdebug   [CPG   ] chosen downlist: sender r(0) ip(10.220.88.41) ; members(old:2 left:0)
>> [12880] cl15-02 corosyncdebug   [CPG   ] got joinlist message from node 0x1
>> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync cluster closed process group service v1.01
>> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section nodes: OK (rc=0, origin=cl15-02/crmd/27157, version=0.18.22)
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[0] group:clvmd, ip:r(0) ip(10.220.88.41) , pid:35677
>> Apr 11 16:19:54 [13373] cl15-02        cib:     info: cib_process_request:      Completed cib_modify operation for section status: OK (rc=0, origin=cl15-02/crmd/27158, version=0.18.22)
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[1] group:dlm:ls:clvmd\x00, ip:r(0) ip(10.220.88.41) , pid:34995
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[2] group:dlm:controld\x00, ip:r(0) ip(10.220.88.41) , pid:34995
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[3] group:crmd\x00, ip:r(0) ip(10.220.88.41) , pid:13378
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[4] group:attrd\x00, ip:r(0) ip(10.220.88.41) , pid:13376
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[5] group:stonith-ng\x00, ip:r(0) ip(10.220.88.41) , pid:13374
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[6] group:cib\x00, ip:r(0) ip(10.220.88.41) , pid:13373
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[7] group:pacemakerd\x00, ip:r(0) ip(10.220.88.41) , pid:13372
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[8] group:crmd\x00, ip:r(0) ip(10.220.88.47) , pid:12879
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[9] group:attrd\x00, ip:r(0) ip(10.220.88.47) , pid:12877
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[10] group:stonith-ng\x00, ip:r(0) ip(10.220.88.47) , pid:12875
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[11] group:cib\x00, ip:r(0) ip(10.220.88.47) , pid:12874
>> [12880] cl15-02 corosyncdebug   [CPG   ] joinlist_messages[12] group:pacemakerd\x00, ip:r(0) ip(10.220.88.47) , pid:12873
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[1]: votes: 1, expected: 3 flags: 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[2]: votes: 1, expected: 3 flags: 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: No Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] got nodeinfo message from cluster node 2
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] nodeinfo message[0]: votes: 0, expected: 0 flags: 0
>> [12880] cl15-02 corosyncdebug   [SYNC  ] Committing synchronization for corosync vote quorum service v1.0
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] total_votes=2, expected_votes=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 1 state=1, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 2 state=1, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] node 3 state=2, votes=1, expected=3
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] lowest node id: 1 us: 1
>> [12880] cl15-02 corosyncdebug   [VOTEQ ] highest node id: 2 us: 1
>> [12880] cl15-02 corosyncnotice  [QUORUM] Members[2]: 1 2
>> [12880] cl15-02 corosyncdebug   [QUORUM] sending quorum notification to (nil), length = 56
>> [12880] cl15-02 corosyncnotice  [MAIN  ] Completed service synchronization, ready to provide service.
>> [12880] cl15-02 corosyncdebug   [TOTEM ] waiting_trans_ack changed to 0
>> [12880] cl15-02 corosyncdebug   [QUORUM] got quorate request on 0x7f5a907749a0
>> [12880] cl15-02 corosyncdebug   [TOTEM ] entering GATHER state from 11(merge during join).
>>
>>
>> and we do not get them when there is only a single network interface in the systems.
>>
>> --------------------------------------------------------------------------------------
>> These are the network configurations on the three hosts:
>>
>> [root at cl15-02 ~]# ifconfig | grep inet
>>         inet 10.220.88.41  netmask 255.255.248.0  broadcast 10.220.95.255
>>         inet 10.220.246.50  netmask 255.255.255.0  broadcast 10.220.246.255
>>         inet 127.0.0.1  netmask 255.0.0.0
>>
>> [root at cl15-08 ~]# ifconfig | grep inet
>>         inet 10.220.88.47  netmask 255.255.248.0  broadcast 10.220.95.255
>>         inet 10.220.246.51  netmask 255.255.255.0  broadcast 10.220.246.255
>>         inet 127.0.0.1  netmask 255.0.0.0
>>
>> [root at cl15-09 ~]# ifconfig | grep inet
>>         inet 10.220.88.48  netmask 255.255.248.0  broadcast 10.220.95.255
>>         inet 10.220.246.59  netmask 255.255.255.0  broadcast 10.220.246.255
>>         inet 127.0.0.1  netmask 255.0.0.0
>>
>> -----------------------------------------------------------------------------------
>> corosync-quorumtool output:
>>
>> [root at cl15-02 ~]# corosync-quorumtool
>> Quorum information
>> ------------------
>> Date:             Mon Apr 11 15:46:26 2016
>> Quorum provider:  corosync_votequorum
>> Nodes:            3
>> Node ID:          1
>> Ring ID:          18952
>> Quorate:          Yes
>>
>> Votequorum information
>> ----------------------
>> Expected votes:   3
>> Highest expected: 3
>> Total votes:      3
>> Quorum:           2
>> Flags:            Quorate
>>
>> Membership information
>> ----------------------
>>     Nodeid      Votes Name
>>          1          1 cl15-02 (local)
>>          2          1 cl15-08
>>          3          1 cl15-09
>>
>> ---------------------------------------------------------------------------
>> /etc/corosync/corosync.conf:
>>
>> [root at cl15-02 ~]# cat /etc/corosync/corosync.conf
>> totem {
>>     version: 2
>>     secauth: off
>>     cluster_name: gfs_cluster
>>     transport: udpu
>> }
>>
>> nodelist {
>>     node {
>>         ring0_addr: cl15-02
>>         nodeid: 1
>>     }
>>
>>     node {
>>         ring0_addr: cl15-08
>>         nodeid: 2
>>     }
>>
>>     node {
>>         ring0_addr: cl15-09
>>         nodeid: 3
>>     }
>> }
>>
>> quorum {
>>     provider: corosync_votequorum
>> }
>>
>> logging {
>>     debug: on
> 
> 
> You have debug logging on. At a guess I would say that the config file
> with the other interface in it doesn't :)
> 
> Chrissie
> 
> 
>>     to_logfile: yes
>>     logfile: /var/log/cluster/corosync.log
>>     to_syslog: yes
>> }
>>
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From jonathan.davies at citrix.com  Fri Apr 15 14:55:02 2016
From: jonathan.davies at citrix.com (Jonathan Davies)
Date: Fri, 15 Apr 2016 15:55:02 +0100
Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat
	loss
Message-ID: <571100C6.1050606@citrix.com>

Dear linux-cluster,

I have made some observations about the behaviour of gfs2 and would 
appreciate confirmation of whether this is expected behaviour or 
something has gone wrong.

I have a three-node cluster -- let's call the nodes A, B and C. On each 
of nodes A and B, I have a loop that repeatedly writes an increasing 
integer value to a file in the GFS2-mountpoint. On node C, I have a loop 
that reads from both these files from the GFS2-mountpoint. The reads on 
node C show the latest values written by A and B, and stay up-to-date. 
All good so far.

I then cause node A to drop the corosync heartbeat by executing the 
following on node A:

iptables -I INPUT -p udp --dport 5404 -j DROP
iptables -I INPUT -p udp --dport 5405 -j DROP
iptables -I INPUT -p tcp --dport 21064 -j DROP

After a few seconds, I normally observe that all I/O to the GFS2 
filesystem hangs forever on node A: the latest value read by node C is 
the same as the last successful write by node A. This is exactly the 
behaviour I want -- I want to be sure that node A never completes I/O 
that is not able to be seen by other nodes.

However, on some occasions, I observe that node A continues in the loop 
believing that it is successfully writing to the file but, according to 
node C, the file stops being updated. (Meanwhile, the file written by 
node B continues to be up-to-date as read by C.) This is concerning -- 
it looks like I/O writes are being completed on node A even though other 
nodes in the cluster cannot see the results.

I performed this test 20 times, rebooting node A between each, and saw 
the "I/O hanging" behaviour 16 times and the "I/O appears to continue" 
behaviour 4 times. I couldn't see anything that might cause it to 
sometimes adopt one behaviour and sometimes the other.

So... is this expected? Should I be able to rely upon I/O hanging? Or 
have I misconfigured something? Advice would be appreciated.

Thanks,
Jonathan

Notes:
  * The I/O from node A uses an fd that is O_DIRECT|O_SYNC, so the page 
cache is not involved.

  * Versions: corosync 2.3.4, dlm_controld 4.0.2, gfs2 as per RHEL 7.2.

  * I don't see anything particularly useful being logged. Soon after I 
insert the iptables rules on node A, I see the following on node A:

2016-04-15T14:15:45.608175+00:00 localhost corosync[3074]:  [TOTEM ] The 
token was lost in the OPERATIONAL state.
2016-04-15T14:15:45.608191+00:00 localhost corosync[3074]:  [TOTEM ] A 
processor failed, forming new configuration.
2016-04-15T14:15:45.608198+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 2(The token was lost in the OPERATIONAL state.).

Around the time node C sees the output from node A stop changing, node A 
reports:

2016-04-15T14:15:58.388404+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 0(consensus timeout).

  * corosync.conf:

totem {
   version: 2
   secauth: off
   cluster_name: 1498d523
   transport: udpu
   token_retransmits_before_loss_const: 10
   token: 10000
}

logging {
   debug: on
}

quorum {
   provider: corosync_votequorum
}

nodelist {
   node {
     ring0_addr: 10.220.73.6
   }
   node {
     ring0_addr: 10.220.73.7
   }
   node {
     ring0_addr: 10.220.73.3
   }
}



From rpeterso at redhat.com  Fri Apr 15 15:14:28 2016
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 15 Apr 2016 11:14:28 -0400 (EDT)
Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after
 heartbeat	loss
In-Reply-To: <571100C6.1050606@citrix.com>
References: <571100C6.1050606@citrix.com>
Message-ID: <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Dear linux-cluster,
> 
> I have made some observations about the behaviour of gfs2 and would
> appreciate confirmation of whether this is expected behaviour or
> something has gone wrong.
> 
> I have a three-node cluster -- let's call the nodes A, B and C. On each
> of nodes A and B, I have a loop that repeatedly writes an increasing
> integer value to a file in the GFS2-mountpoint. On node C, I have a loop
> that reads from both these files from the GFS2-mountpoint. The reads on
> node C show the latest values written by A and B, and stay up-to-date.
> All good so far.
> 
> I then cause node A to drop the corosync heartbeat by executing the
> following on node A:
> 
> iptables -I INPUT -p udp --dport 5404 -j DROP
> iptables -I INPUT -p udp --dport 5405 -j DROP
> iptables -I INPUT -p tcp --dport 21064 -j DROP
> 
> After a few seconds, I normally observe that all I/O to the GFS2
> filesystem hangs forever on node A: the latest value read by node C is
> the same as the last successful write by node A. This is exactly the
> behaviour I want -- I want to be sure that node A never completes I/O
> that is not able to be seen by other nodes.
> 
> However, on some occasions, I observe that node A continues in the loop
> believing that it is successfully writing to the file but, according to
> node C, the file stops being updated. (Meanwhile, the file written by
> node B continues to be up-to-date as read by C.) This is concerning --
> it looks like I/O writes are being completed on node A even though other
> nodes in the cluster cannot see the results.
> 
> I performed this test 20 times, rebooting node A between each, and saw
> the "I/O hanging" behaviour 16 times and the "I/O appears to continue"
> behaviour 4 times. I couldn't see anything that might cause it to
> sometimes adopt one behaviour and sometimes the other.
> 
> So... is this expected? Should I be able to rely upon I/O hanging? Or
> have I misconfigured something? Advice would be appreciated.
> 
> Thanks,
> Jonathan

Hi Jonathan,

This seems like expected behavior to me. It probably all goes back to
whatever node "masters" the glock and the node that "owns" the glock,
when communications are lost.

In your test, the DLM lock is being traded back and forth between the
file's writer on A and the file's reader on C. Then communication to
the DLM is blocked. 

When that happens, if the reader (C) happens to own the DLM lock when
it loses DLM communications, the writer will block on DLM, and can't
write a new value. The reader owns the lock, so it keeps reading the
same value over and over.

However, if A happens to own the DLM lock, it does not need
to ask DLM's permission because it owns the lock. Therefore, it goes
on writing. Meanwhile, the other node can't get DLM's permission to
get the lock back, so it hangs.

There's also the problem of the DLM lock "master" which presents
another level of complexity to the mix, but let's not go into that now.

Suffice it to say I think it's working as expected.

Regards,

Bob Peterson
Red Hat File Systems



From teigland at redhat.com  Fri Apr 15 16:14:37 2016
From: teigland at redhat.com (David Teigland)
Date: Fri, 15 Apr 2016 11:14:37 -0500
Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after
 heartbeat	loss
In-Reply-To: <1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com>
References: <571100C6.1050606@citrix.com>
	<1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com>
Message-ID: <20160415161437.GB10934@redhat.com>

> > However, on some occasions, I observe that node A continues in the loop
> > believing that it is successfully writing to the file

node A has the exclusive lock, so it continues writing...

> > but, according to
> > node C, the file stops being updated. (Meanwhile, the file written by
> > node B continues to be up-to-date as read by C.) This is concerning --
> > it looks like I/O writes are being completed on node A even though other
> > nodes in the cluster cannot see the results.

Is node C blocked trying to read the file A is writing?  That what we'd
expect until recovery has removed node A.  Or are C's reads completing
while A continues writing the file?  That would not be correct.

> However, if A happens to own the DLM lock, it does not need
> to ask DLM's permission because it owns the lock. Therefore, it goes
> on writing. Meanwhile, the other node can't get DLM's permission to
> get the lock back, so it hangs.

The description sounds like C might not be hanging in read as we'd expect
while A continues writing.  If that's the case, then it implies that dlm
recovery has been completed by nodes B and C (removing A), which allows
the lock to be granted to C for reading.  If dlm recovery on B/C has
completed, it means that A should have been fenced, so A should not be
able to write once C is given the lock.

Dave



From jonathan.davies at citrix.com  Mon Apr 18 13:12:58 2016
From: jonathan.davies at citrix.com (Jonathan Davies)
Date: Mon, 18 Apr 2016 14:12:58 +0100
Subject: [Linux-cluster] I/O to gfs2 hanging or not hanging after
 heartbeat loss
In-Reply-To: <20160415161437.GB10934@redhat.com>
References: <571100C6.1050606@citrix.com>
	<1642816801.51620371.1460733268322.JavaMail.zimbra@redhat.com>
	<20160415161437.GB10934@redhat.com>
Message-ID: <5714DD5A.2040400@citrix.com>



On 15/04/16 17:14, David Teigland wrote:
>>> However, on some occasions, I observe that node A continues in the loop
>>> believing that it is successfully writing to the file
>
> node A has the exclusive lock, so it continues writing...
>
>>> but, according to
>>> node C, the file stops being updated. (Meanwhile, the file written by
>>> node B continues to be up-to-date as read by C.) This is concerning --
>>> it looks like I/O writes are being completed on node A even though other
>>> nodes in the cluster cannot see the results.
>
> Is node C blocked trying to read the file A is writing?  That what we'd
> expect until recovery has removed node A.  Or are C's reads completing
> while A continues writing the file?  That would not be correct.
>
>> However, if A happens to own the DLM lock, it does not need
>> to ask DLM's permission because it owns the lock. Therefore, it goes
>> on writing. Meanwhile, the other node can't get DLM's permission to
>> get the lock back, so it hangs.
>
> The description sounds like C might not be hanging in read as we'd expect
> while A continues writing.  If that's the case, then it implies that dlm
> recovery has been completed by nodes B and C (removing A), which allows
> the lock to be granted to C for reading.  If dlm recovery on B/C has
> completed, it means that A should have been fenced, so A should not be
> able to write once C is given the lock.

Thanks Bob and Dave for your very helpful insights.

Your line of reasoning led me to realise that I am running dlm with 
fencing disabled, which explains everything. Node C was not hanging in 
read while A continued to write; it was constantly returning an old 
value. I presume that's legitimate as C believes the value it saw last 
must still be up-to-date because A must have been fenced so couldn't 
have updated it. (It also explains why I didn't see anything useful in 
the logs.)

When I run the same test with fencing enabled then, although A continues 
writing after the failure, the read on C hangs until A is fenced, at 
which point it is able to read the last value A wrote. That's exactly 
what I want.

Apologies for the noise, and thanks for the explanations.

Jonathan