[Cluster-devel] [PATCH] Retry wait_event_interruptible in event of ERESTARTSYS
Mark Syms
mark.syms at citrix.com
Fri Feb 1 14:00:27 UTC 2019
We saw an issue in a production server on a customer deployment where
DLM 4.0.7 gets "stuck" and unable to join new lockspaces.
See - https://lists.clusterlabs.org/pipermail/users/2019-January/016054.html
This was forwarded off list to David Teigland who responded thusly.
"
Hi, thanks for the debugging info. You've spent more time looking at
this than I have, but from a first glance it seems to me that the
initial problem (there may be multiple) is that in the kernel,
lockspace.c do_event() does not sensibly handle the ERESTARTSYS error
from wait_event_interruptible(). I think do_event() should continue
waiting for a uevent result from userspace until it gets one, because
the kernel can't do anything sensible until it gets that.
Dave
"
This change does that. We have it running in automation with no problems
so far but comments welcome.
Mark Syms (1):
Retry wait_event_interruptible in event of ERESTARTSYS
fs/dlm/lockspace.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--
1.8.3.1
More information about the Cluster-devel
mailing list