[dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking

Hannes Reinecke hare at suse.de
Tue Feb 13 10:20:39 UTC 2018


On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock.  To
> deal with this, I've done away with the holder spin_lock completely, and
> replaced it with two atomic variables, based on a suggestion by Martin
> Wilck.
> 
> The holder variable works exactly like before.  When the checker is
> initialized, it is set to 1. When a thread is created it is incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
> 
> The other variable has changed. First, ct->running and ct->thread have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread.  If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
> 
> ct->running is now an atomic variable.  When the thread is started
> it is set to 1. When the checker wants to kill a thread, it atomically
> sets the value to 0 and reads the previous value.  If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done.  After the checker has dealt with the thread, it sets ct->thread
> to NULL.
> 
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and exits.
> If it was 0, then the checker is trying to cancel the thread, and so the
> thread calls pause(), which is a cancellation point.
> 
> Cc: Martin Wilck <mwilck at suse.com>
> Cc: Bart Van Assche <Bart.VanAssche at wdc.com>
> Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> ---
>  libmultipath/checkers/tur.c | 107 ++++++++++++++++++--------------------------
>  1 file changed, 43 insertions(+), 64 deletions(-)
> 
[ .. ]
> @@ -391,19 +368,17 @@ int libcheck_check(struct checker * c)
>  		ct->state = PATH_UNCHECKED;
>  		ct->fd = c->fd;
>  		ct->timeout = c->timeout;
> -		pthread_spin_lock(&ct->hldr_lock);
> -		ct->holders++;
> -		pthread_spin_unlock(&ct->hldr_lock);
> +		uatomic_add(&ct->holders, 1);
> +		uatomic_set(&ct->running, 1);
>  		tur_set_async_timeout(c);
>  		setup_thread_attr(&attr, 32 * 1024, 1);
>  		r = pthread_create(&ct->thread, &attr, tur_thread, ct);
>  		pthread_attr_destroy(&attr);
>  		if (r) {
> -			pthread_spin_lock(&ct->hldr_lock);
> -			ct->holders--;
> -			pthread_spin_unlock(&ct->hldr_lock);
> -			pthread_mutex_unlock(&ct->lock);
> +			uatomic_sub(&ct->holders, 1);
> +			uatomic_set(&ct->running, 0);
>  			ct->thread = 0;
> +			pthread_mutex_unlock(&ct->lock);
>  			condlog(3, "%s: failed to start tur thread, using"
>  				" sync mode", tur_devt(devt, sizeof(devt), ct));
>  			return tur_check(c->fd, c->timeout,
I would rather set 'ct->running' from within the thread; otherwise there
is a chance that pthread_create() returns without error, but the thread
itself hasn't been run yet.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)




More information about the dm-devel mailing list