[dm-devel] [PATCH 1/7] multipath: fix tur checker locking

Bart Van Assche Bart.VanAssche at wdc.com
Thu Feb 8 19:34:44 UTC 2018


On Thu, 2018-02-08 at 13:27 -0600, Benjamin Marzinski wrote:
> On Thu, Feb 08, 2018 at 06:19:32PM +0000, Bart Van Assche wrote:
> > On Wed, 2018-02-07 at 16:49 -0600, Benjamin Marzinski wrote:
> > > Commit 6e2423fd fixed a bug where the tur checker could cancel a
> > > detached thread after it had exitted. However in fixing this, the new
> > > code grabbed a mutex (to call condlog) while holding a spin_lock. To
> > > deal with that, and to try to keep with the maixim "lock data, not
> > > code", I've changed how the tur checker synchronizes with its thread.
> > > 
> > > Now, the tur checker creates joinable threads, and detaches them when
> > > the thread is finished or has timed out. To track the state of the
> > > threads, I've added a new variable to the checker context, ct->attached.
> > > When a thread starts, attached is set to 1. When the thread finishes, it
> > > saves the value of attached, and then zeros it out, while locked. If
> > > attached was set, it detaches itself.
> > > 
> > > When the tur checker gives up on a thread, it also saves and decrements
> > > ct->attached, while locked. At the same time it saves the value of
> > > ct->thread.  If attached was set, it cancels the thread, and then
> > > detaches it.
> > > 
> > > So the values that are protected by the spin lock are now ct->holders,
> > > ct->thread, and ct->attached. There are cases where the tur checker just
> > > wants to know if the thread is running. If so, its safe to simply read
> > > ct->thread without locking.  Also, if it knows that the thread isn't
> > > running, it can freely access all of these variables. I've left the
> > > locking in-place in these cases to make the static analyzers happy.
> > > However I have added comments stating when the locking isn't actually
> > > necessary.
> > 
> > Hello Ben,
> > 
> > Have you considered to move the condlog() statements out of the spinlock
> > section? I think that would lead to a much smaller and less contrived change
> > than the patch you posted.
> 
> Yes. I thought it was better to keep the amount of code locked under a
> spin_lock as small as possible. But it seems that I am alone in thinking
> my trade-off was worth it.  If both you and Martin object, I'm fine with
> redoing the patch to simply move the condlog(), since I don't believe
> that either tur_check_async_timeout() or pthread_cancel() can block.

Hello Ben,

Have you considered to convert the hldr_lock spinlock into a mutex? I think
that the hldr_lock spinlock got introduced by commit 892f7b333a03 ("multipath:
fix scsi async tur checker corruption"). However, I have not found an
explanation in the description of that commit why hldr_lock is a spinlock
and not any other type of synchronization primitive.

Thanks,

Bart.




More information about the dm-devel mailing list