[Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

Fri Apr 15 14:55:02 UTC 2016

Dear linux-cluster,

I have made some observations about the behaviour of gfs2 and would 
appreciate confirmation of whether this is expected behaviour or 
something has gone wrong.

I have a three-node cluster -- let's call the nodes A, B and C. On each 
of nodes A and B, I have a loop that repeatedly writes an increasing 
integer value to a file in the GFS2-mountpoint. On node C, I have a loop 
that reads from both these files from the GFS2-mountpoint. The reads on 
node C show the latest values written by A and B, and stay up-to-date. 
All good so far.

I then cause node A to drop the corosync heartbeat by executing the 
following on node A:

iptables -I INPUT -p udp --dport 5404 -j DROP
iptables -I INPUT -p udp --dport 5405 -j DROP
iptables -I INPUT -p tcp --dport 21064 -j DROP

After a few seconds, I normally observe that all I/O to the GFS2 
filesystem hangs forever on node A: the latest value read by node C is 
the same as the last successful write by node A. This is exactly the 
behaviour I want -- I want to be sure that node A never completes I/O 
that is not able to be seen by other nodes.

However, on some occasions, I observe that node A continues in the loop 
believing that it is successfully writing to the file but, according to 
node C, the file stops being updated. (Meanwhile, the file written by 
node B continues to be up-to-date as read by C.) This is concerning -- 
it looks like I/O writes are being completed on node A even though other 
nodes in the cluster cannot see the results.

I performed this test 20 times, rebooting node A between each, and saw 
the "I/O hanging" behaviour 16 times and the "I/O appears to continue" 
behaviour 4 times. I couldn't see anything that might cause it to 
sometimes adopt one behaviour and sometimes the other.

So... is this expected? Should I be able to rely upon I/O hanging? Or 
have I misconfigured something? Advice would be appreciated.

Thanks,
Jonathan

Notes:
  * The I/O from node A uses an fd that is O_DIRECT|O_SYNC, so the page 
cache is not involved.

  * Versions: corosync 2.3.4, dlm_controld 4.0.2, gfs2 as per RHEL 7.2.

  * I don't see anything particularly useful being logged. Soon after I 
insert the iptables rules on node A, I see the following on node A:

2016-04-15T14:15:45.608175+00:00 localhost corosync[3074]:  [TOTEM ] The 
token was lost in the OPERATIONAL state.
2016-04-15T14:15:45.608191+00:00 localhost corosync[3074]:  [TOTEM ] A 
processor failed, forming new configuration.
2016-04-15T14:15:45.608198+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 2(The token was lost in the OPERATIONAL state.).

Around the time node C sees the output from node A stop changing, node A 
reports:

2016-04-15T14:15:58.388404+00:00 localhost corosync[3074]:  [TOTEM ] 
entering GATHER state from 0(consensus timeout).

  * corosync.conf:

totem {
   version: 2
   secauth: off
   cluster_name: 1498d523
   transport: udpu
   token_retransmits_before_loss_const: 10
   token: 10000
}

logging {
   debug: on
}

quorum {
   provider: corosync_votequorum
}

nodelist {
   node {
     ring0_addr: 10.220.73.6
   }
   node {
     ring0_addr: 10.220.73.7
   }
   node {
     ring0_addr: 10.220.73.3
   }
}