[dm-devel] [PATCH v3 01/19] libmultipath: fix tur checker timeout
Benjamin Marzinski
bmarzins at redhat.com
Fri Oct 5 17:02:59 UTC 2018
On Fri, Oct 05, 2018 at 12:11:18PM +0200, Martin Wilck wrote:
> On Thu, 2018-10-04 at 11:31 -0500, Benjamin Marzinski wrote:
> > It's been a while, and I'm not exactly sure what I was thinking here.
> > But IIRC the idea was that if the state isn't set yet, then the old
> > thread could mess with the results of a future thread. Also, it was
> > to
> > avoid a corner case where the tur checker was called, started the
> > thread, got the result before the checker exitted, cancelled the
> > thread
> > and returned the result and then was called very shortly afterwards,
> > before the thread could clean itself up. In that case, the right
> > thing
> > to do (I thought) would be to start a new thread, because the other
> > one
> > would be ending soon. In truth, starting a new thread at all is
> > probably a bad idea, since the old thread will still mess with the
> > checker context on exit.
> >
> > A better idea might be to simply fail back to a syncronous call to
> > tur_check() when you notice a cancelled thread that hasn't exitted.
> > That can cause all the other checkers to get delayed, but at least in
> > that case you are still checking paths. The other option is to do as
> > before and just not check this path, which won't effect the other
> > checkers. It seems very unlikely that the thread could remain
> > uncancelled forever, especially if the path was usable.
> >
> > thoughts?
>
> Generally speaking, we're deeply in the realm of highly unlikely
> situations I would say. But we should get it right eventually.
>
> Maybe we can add logic to the tur thread to keep its hands off the
> context if it's a "zombie", like below (just a thought, untested)?
This still wouldn't stop a thread from racing with new thread creation
to change ct->holders or ct->running.
-Ben
> Martin
>
>
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index bf8486d3..e8493ca8 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -219,11 +219,18 @@ static void cleanup_func(void *data)
> rcu_unregister_thread();
> }
>
> +#define tur_thread_quit_unless_owner(__ct) \
> + if (__ct->thread != pthread_self()) { \
> + pthread_mutex_unlock(&__ct->lock); \
> + pthread_exit(NULL); \
> + }
> +
> static void copy_msg_to_tcc(void *ct_p, const char *msg)
> {
> struct tur_checker_context *ct = ct_p;
>
> pthread_mutex_lock(&ct->lock);
> + tur_thread_quit_unless_owner(ct);
> strlcpy(ct->message, msg, sizeof(ct->message));
> pthread_mutex_unlock(&ct->lock);
> }
> @@ -243,6 +250,7 @@ static void *tur_thread(void *ctx)
>
> /* TUR checker start up */
> pthread_mutex_lock(&ct->lock);
> + tur_thread_quit_unless_owner(ct);
> ct->state = PATH_PENDING;
> ct->message[0] = '\0';
> pthread_mutex_unlock(&ct->lock);
> @@ -252,6 +260,7 @@ static void *tur_thread(void *ctx)
>
> /* TUR checker done */
> pthread_mutex_lock(&ct->lock);
> + tur_thread_quit_unless_owner(ct);
> ct->state = state;
> pthread_cond_signal(&ct->active);
> pthread_mutex_unlock(&ct->lock);
>
>
>
>
> --
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
> SUSELinux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
>
More information about the dm-devel
mailing list