[dm-devel] [PATCH v3 1/7] libmultipath: fix tur checker locking
Hannes Reinecke
hare at suse.de
Tue Feb 13 10:20:39 UTC 2018
On 02/13/2018 04:42 AM, Benjamin Marzinski wrote:
> Commit 6e2423fd fixed a bug where the tur checker could cancel a
> detached thread after it had exitted. However in fixing this, the new
> code grabbed a mutex (to call condlog) while holding a spin_lock. To
> deal with this, I've done away with the holder spin_lock completely, and
> replaced it with two atomic variables, based on a suggestion by Martin
> Wilck.
>
> The holder variable works exactly like before. When the checker is
> initialized, it is set to 1. When a thread is created it is incremented.
> When either the thread or the checker are done with the context, they
> atomically decrement the holder variable and check its value. If it
> is 0, they free the context. If it is 1, they never touch the context
> again.
>
> The other variable has changed. First, ct->running and ct->thread have
> switched uses. ct->thread is now only ever accessed by the checker,
> never the thread. If it is non-NULL, a thread has started up, but
> hasn't been dealt with by the checker yet. It is also obviously used
> by the checker to cancel the thread.
>
> ct->running is now an atomic variable. When the thread is started
> it is set to 1. When the checker wants to kill a thread, it atomically
> sets the value to 0 and reads the previous value. If it was 1,
> the checker cancels the thread. If it was 0, the nothing needs to be
> done. After the checker has dealt with the thread, it sets ct->thread
> to NULL.
>
> Right before the thread finishes and pops the cleanup handler, it
> atomically sets the value of ct->running to 0 and reads the previous
> value. If it was 1, the thread just pops the cleanup handler and exits.
> If it was 0, then the checker is trying to cancel the thread, and so the
> thread calls pause(), which is a cancellation point.
>
> Cc: Martin Wilck <mwilck at suse.com>
> Cc: Bart Van Assche <Bart.VanAssche at wdc.com>
> Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> ---
> libmultipath/checkers/tur.c | 107 ++++++++++++++++++--------------------------
> 1 file changed, 43 insertions(+), 64 deletions(-)
>
[ .. ]
> @@ -391,19 +368,17 @@ int libcheck_check(struct checker * c)
> ct->state = PATH_UNCHECKED;
> ct->fd = c->fd;
> ct->timeout = c->timeout;
> - pthread_spin_lock(&ct->hldr_lock);
> - ct->holders++;
> - pthread_spin_unlock(&ct->hldr_lock);
> + uatomic_add(&ct->holders, 1);
> + uatomic_set(&ct->running, 1);
> tur_set_async_timeout(c);
> setup_thread_attr(&attr, 32 * 1024, 1);
> r = pthread_create(&ct->thread, &attr, tur_thread, ct);
> pthread_attr_destroy(&attr);
> if (r) {
> - pthread_spin_lock(&ct->hldr_lock);
> - ct->holders--;
> - pthread_spin_unlock(&ct->hldr_lock);
> - pthread_mutex_unlock(&ct->lock);
> + uatomic_sub(&ct->holders, 1);
> + uatomic_set(&ct->running, 0);
> ct->thread = 0;
> + pthread_mutex_unlock(&ct->lock);
> condlog(3, "%s: failed to start tur thread, using"
> " sync mode", tur_devt(devt, sizeof(devt), ct));
> return tur_check(c->fd, c->timeout,
I would rather set 'ct->running' from within the thread; otherwise there
is a chance that pthread_create() returns without error, but the thread
itself hasn't been run yet.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare at suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
More information about the dm-devel
mailing list