[dm-devel] [PATCH v2 07/10] libmultipath: handle TUR threads that can't be cancelled
Martin Wilck
mwilck at suse.com
Tue Oct 23 13:43:45 UTC 2018
When the tur checker code determines that a hanging TUR thread
couldn't be cancelled, rather than simply returning, reallocate
the checker context and start a new thread. This will leak some
memory if the hanging thread never wakes up again, but well, in
that highly unlikely case we're leaking threads anyway.
Signed-off-by: Martin Wilck <mwilck at suse.com>
---
libmultipath/checkers/tur.c | 24 +++++++++++++++++++++---
1 file changed, 21 insertions(+), 3 deletions(-)
diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 41210892..a6c88eb2 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -349,11 +349,29 @@ int libcheck_check(struct checker * c)
}
} else {
if (uatomic_read(&ct->holders) > 1) {
- /* The thread has been cancelled but hasn't
- * quit. exit with timeout. */
+ /*
+ * The thread has been cancelled but hasn't quit.
+ * We have to prevent it from interfering with the new
+ * thread. We create a new context and leave the old
+ * one with the stale thread, hoping it will clean up
+ * eventually.
+ */
condlog(3, "%d:%d : tur thread not responding",
major(ct->devt), minor(ct->devt));
- return PATH_TIMEOUT;
+
+ /*
+ * libcheck_init will replace c->context.
+ * It fails only in OOM situations. In this case, return
+ * PATH_UNCHECKED to avoid prematurely failing the path.
+ */
+ if (libcheck_init(c) != 0)
+ return PATH_UNCHECKED;
+
+ if (!uatomic_sub_return(&ct->holders, 1))
+ /* It did terminate, eventually */
+ cleanup_context(ct);
+
+ ct = c->context;
}
/* Start new TUR checker */
pthread_mutex_lock(&ct->lock);
--
2.19.1
More information about the dm-devel
mailing list