[Linux-cluster] GFS2 DLM problem on NVMes
teigland at redhat.com
Mon Nov 20 19:09:32 UTC 2017
> We are developing storage systems using 10 NVMes (current test set).
> Using MD RAID10 + CLVM/GFS2 over four hosts achieves 22 GB/s (Max. on Reads).
Does MD RAID10 work correctly under GFS2? Does the RAID10 make use of the
recent md-cluster enhancements (which also use the dlm)?
> However, a GFS2 DLM problem occurred. The problem is that each host
> frequently reports dlm: gfs2: send_repeat_remove kernel messages,
> and I/O throughput becomes unstable and low.
send_repeat_remove is a mysterious corner case, related to the resource
directory becoming out of sync with the actual resource master. There's
an inherent race in this area of the dlm which is hard to solve because
the same record (mapping of resource name to master nodeid) needs to be
changed consistently on two nodes. Perhaps in the future the dlm could be
enhanced with some algorithm to do that better. For now, it just repeats
the change (logging the message you see). If the repeated operation is
working, then things won't be permanently stuck.
The most likely cause, it seems to me, is that the speed of storage
relative to the speed of the network is triggering pathological timing
issues in the dlm. Try adjusting the "toss_secs" tunable, which controls
how long a node will hold on to an unused resource before giving up
mastery of it (the master change is what leads to the inconsistency
echo 1000 > /sys/kernel/config/dlm/cluster/toss_secs
The default is 10, I'd try 100/1000/10000. A number too large could have
negative consequences of not freeing enough dlm resources that will never
be used again, e.g. if you are deleting a lot of files. Set this number
before mounting gfs for it to take effect.
In the past, I think that send_repeat_remove has tended to appear when
there's a huge volume of dlm messages, triggered by excessive caching done
by gfs when there's a large amount of system memory. The huge volume of
dlm messages results in the messages appearing in unusual sequences,
reversing the usual cause-effect.
More information about the Linux-cluster