[Cluster-devel] [PATCH] Retry wait_event_interruptible in event of ERESTARTSYS

Mark Syms mark.syms at citrix.com
Fri Feb 1 14:00:27 UTC 2019


We saw an issue in a production server on a customer deployment where
DLM 4.0.7 gets "stuck" and unable to join new lockspaces.

See - https://lists.clusterlabs.org/pipermail/users/2019-January/016054.html

This was forwarded off list to David Teigland who responded thusly.

"
Hi, thanks for the debugging info.  You've spent more time looking at
this than I have, but from a first glance it seems to me that the
initial problem (there may be multiple) is that in the kernel,
lockspace.c do_event() does not sensibly handle the ERESTARTSYS error
from wait_event_interruptible().  I think do_event() should continue
waiting for a uevent result from userspace until it gets one, because
the kernel can't do anything sensible until it gets that.

Dave
"

This change does that. We have it running in automation with no problems
so far but comments welcome.

Mark Syms (1):
  Retry wait_event_interruptible in event of ERESTARTSYS

 fs/dlm/lockspace.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
1.8.3.1




More information about the Cluster-devel mailing list