[Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

Fri Apr 15 15:14:28 UTC 2016

----- Original Message -----
> Dear linux-cluster,
> 
> I have made some observations about the behaviour of gfs2 and would
> appreciate confirmation of whether this is expected behaviour or
> something has gone wrong.
> 
> I have a three-node cluster -- let's call the nodes A, B and C. On each
> of nodes A and B, I have a loop that repeatedly writes an increasing
> integer value to a file in the GFS2-mountpoint. On node C, I have a loop
> that reads from both these files from the GFS2-mountpoint. The reads on
> node C show the latest values written by A and B, and stay up-to-date.
> All good so far.
> 
> I then cause node A to drop the corosync heartbeat by executing the
> following on node A:
> 
> iptables -I INPUT -p udp --dport 5404 -j DROP
> iptables -I INPUT -p udp --dport 5405 -j DROP
> iptables -I INPUT -p tcp --dport 21064 -j DROP
> 
> After a few seconds, I normally observe that all I/O to the GFS2
> filesystem hangs forever on node A: the latest value read by node C is
> the same as the last successful write by node A. This is exactly the
> behaviour I want -- I want to be sure that node A never completes I/O
> that is not able to be seen by other nodes.
> 
> However, on some occasions, I observe that node A continues in the loop
> believing that it is successfully writing to the file but, according to
> node C, the file stops being updated. (Meanwhile, the file written by
> node B continues to be up-to-date as read by C.) This is concerning --
> it looks like I/O writes are being completed on node A even though other
> nodes in the cluster cannot see the results.
> 
> I performed this test 20 times, rebooting node A between each, and saw
> the "I/O hanging" behaviour 16 times and the "I/O appears to continue"
> behaviour 4 times. I couldn't see anything that might cause it to
> sometimes adopt one behaviour and sometimes the other.
> 
> So... is this expected? Should I be able to rely upon I/O hanging? Or
> have I misconfigured something? Advice would be appreciated.
> 
> Thanks,
> Jonathan

Hi Jonathan,

This seems like expected behavior to me. It probably all goes back to
whatever node "masters" the glock and the node that "owns" the glock,
when communications are lost.

In your test, the DLM lock is being traded back and forth between the
file's writer on A and the file's reader on C. Then communication to
the DLM is blocked. 

When that happens, if the reader (C) happens to own the DLM lock when
it loses DLM communications, the writer will block on DLM, and can't
write a new value. The reader owns the lock, so it keeps reading the
same value over and over.

However, if A happens to own the DLM lock, it does not need
to ask DLM's permission because it owns the lock. Therefore, it goes
on writing. Meanwhile, the other node can't get DLM's permission to
get the lock back, so it hangs.

There's also the problem of the DLM lock "master" which presents
another level of complexity to the mix, but let's not go into that now.

Suffice it to say I think it's working as expected.

Regards,

Bob Peterson
Red Hat File Systems