[dm-devel] [PATCH v3 01/19] libmultipath: fix tur checker timeout

Benjamin Marzinski bmarzins at redhat.com
Fri Oct 5 17:02:59 UTC 2018


On Fri, Oct 05, 2018 at 12:11:18PM +0200, Martin Wilck wrote:
> On Thu, 2018-10-04 at 11:31 -0500,  Benjamin Marzinski  wrote:
> > It's been a while, and I'm not exactly sure what I was thinking here.
> > But IIRC the idea was that if the state isn't set yet, then the old
> > thread could mess with the results of a future thread. Also, it was
> > to
> > avoid a corner case where the tur checker was called, started the
> > thread, got the result before the checker exitted, cancelled the
> > thread
> > and returned the result and then was called very shortly afterwards,
> > before the thread could clean itself up.  In that case, the right
> > thing
> > to do (I thought) would be to start a new thread, because the other
> > one
> > would be ending soon.  In truth, starting a new thread at all is
> > probably a bad idea, since the old thread will still mess with the
> > checker context on exit. 
> > 
> > A better idea might be to simply fail back to a syncronous call to
> > tur_check() when you notice a cancelled thread that hasn't exitted.
> > That can cause all the other checkers to get delayed, but at least in
> > that case you are still checking paths. The other option is to do as
> > before and just not check this path, which won't effect the other
> > checkers. It seems very unlikely that the thread could remain
> > uncancelled forever, especially if the path was usable.
> >
> > thoughts?
> 
> Generally speaking, we're deeply in the realm of highly unlikely
> situations I would say. But we should get it right eventually.
> 
> Maybe we can add logic to the tur thread to keep its hands off the
> context if it's a "zombie", like below (just a thought, untested)?

This still wouldn't stop a thread from racing with new thread creation
to change ct->holders or ct->running.

-Ben

> Martin
> 
> 
> diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
> index bf8486d3..e8493ca8 100644
> --- a/libmultipath/checkers/tur.c
> +++ b/libmultipath/checkers/tur.c
> @@ -219,11 +219,18 @@ static void cleanup_func(void *data)
>  	rcu_unregister_thread();
>  }
>  
> +#define tur_thread_quit_unless_owner(__ct)	   \
> +	if (__ct->thread != pthread_self()) {	   \
> +		pthread_mutex_unlock(&__ct->lock); \
> +		pthread_exit(NULL);		   \
> +	}
> +
>  static void copy_msg_to_tcc(void *ct_p, const char *msg)
>  {
>  	struct tur_checker_context *ct = ct_p;
>  
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	strlcpy(ct->message, msg, sizeof(ct->message));
>  	pthread_mutex_unlock(&ct->lock);
>  }
> @@ -243,6 +250,7 @@ static void *tur_thread(void *ctx)
>  
>  	/* TUR checker start up */
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	ct->state = PATH_PENDING;
>  	ct->message[0] = '\0';
>  	pthread_mutex_unlock(&ct->lock);
> @@ -252,6 +260,7 @@ static void *tur_thread(void *ctx)
>  
>  	/* TUR checker done */
>  	pthread_mutex_lock(&ct->lock);
> +	tur_thread_quit_unless_owner(ct);
>  	ct->state = state;
>  	pthread_cond_signal(&ct->active);
>  	pthread_mutex_unlock(&ct->lock);
> 
> 
> 
> 
> -- 
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
> SUSELinux GmbH, GF: Felix Imendörffer, Jane Smithard,  Graham Norton
> HRB 21284 (AG Nürnberg)
> 




More information about the dm-devel mailing list