[Linux-cluster] I/O to gfs2 hanging or not hanging after heartbeat loss

Jonathan Davies jonathan.davies at citrix.com
Mon Apr 18 13:12:58 UTC 2016



On 15/04/16 17:14, David Teigland wrote:
>>> However, on some occasions, I observe that node A continues in the loop
>>> believing that it is successfully writing to the file
>
> node A has the exclusive lock, so it continues writing...
>
>>> but, according to
>>> node C, the file stops being updated. (Meanwhile, the file written by
>>> node B continues to be up-to-date as read by C.) This is concerning --
>>> it looks like I/O writes are being completed on node A even though other
>>> nodes in the cluster cannot see the results.
>
> Is node C blocked trying to read the file A is writing?  That what we'd
> expect until recovery has removed node A.  Or are C's reads completing
> while A continues writing the file?  That would not be correct.
>
>> However, if A happens to own the DLM lock, it does not need
>> to ask DLM's permission because it owns the lock. Therefore, it goes
>> on writing. Meanwhile, the other node can't get DLM's permission to
>> get the lock back, so it hangs.
>
> The description sounds like C might not be hanging in read as we'd expect
> while A continues writing.  If that's the case, then it implies that dlm
> recovery has been completed by nodes B and C (removing A), which allows
> the lock to be granted to C for reading.  If dlm recovery on B/C has
> completed, it means that A should have been fenced, so A should not be
> able to write once C is given the lock.

Thanks Bob and Dave for your very helpful insights.

Your line of reasoning led me to realise that I am running dlm with 
fencing disabled, which explains everything. Node C was not hanging in 
read while A continued to write; it was constantly returning an old 
value. I presume that's legitimate as C believes the value it saw last 
must still be up-to-date because A must have been fenced so couldn't 
have updated it. (It also explains why I didn't see anything useful in 
the logs.)

When I run the same test with fencing enabled then, although A continues 
writing after the failure, the read on C hangs until A is fenced, at 
which point it is able to read the last value A wrote. That's exactly 
what I want.

Apologies for the noise, and thanks for the explanations.

Jonathan




More information about the Linux-cluster mailing list