[dm-devel] dm-mq and end_clone_request()

Mike Snitzer snitzer at redhat.com
Thu Aug 4 15:10:28 UTC 2016


On Thu, Aug 04 2016 at  6:09am -0400,
Hannes Reinecke <hare at suse.de> wrote:

> On 08/04/2016 11:53 AM, Hannes Reinecke wrote:
> > On 08/03/2016 06:55 PM, Bart Van Assche wrote:
> >> On 08/02/2016 05:40 PM, Mike Snitzer wrote:
> >>> But I asked you to run the v4.7 kernel patches I
> >>> pointed to _without_ any of your debug patches.
> >>
> >> I need several patches to fix bugs that are not related to the device
> >> mapper, e.g. "sched: Avoid that __wait_on_bit_lock() hangs"
> >> (https://lkml.org/lkml/2016/8/3/289).
> >>
> > Hmm. Can you test with this patch?
> > 
> > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > index 7790a70..9daed03 100644
> > --- a/drivers/md/dm-mpath.c
> > +++ b/drivers/md/dm-mpath.c
> > @@ -439,8 +439,7 @@ static int must_push_back(struct multipath *m)
> >  {
> >         return (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) ||
> >                 ((test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) !=
> > -                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)) &&
> > -                dm_noflush_suspending(m->ti)));
> > +                 test_bit(MPATHF_SAVED_QUEUE_IF_NO_PATH, &m->flags)));
> >  }
> > 
> >  /*
> > 
> > Reasoning:
> > The original check for dm_noflush_suspending() was for bio-based
> > drivers, which needed to queue I/O within the device-mapper core.
> > So during suspend this I/O would keep a reference to the device-mapper
> > core and the table couldn't be swapped.
> > For request-based multipathing, however, the I/O is _never_ held within
> > the device-mapper core but rather pushed back to the request queue.
> > IE even for pushback the I/O will never hold a reference to the
> > device-mapper core, and the tables can be swapped irrespective of the
> > 'dm_noflush_suspend()' setting.
> > 
> > Or that's the idea, at least :-)
> > 
> > Yes Mike, I know, it's not going to work with bio-based multipathing.
> > But this is just for figuring out where the real issue is.
> > 
> And indeed.
> 
> multipathd is calling DM_SUSPEND _without_ the noflush_suspending flag.
> (On the grounds that originally it needed to flush all I/O from the
> device-mapper core).
> Which will be causing I/O errors if any I/O is executed after
> ->presuspend has been called.

The only time multipathd doesn't use noflush is on resize.  Otherwise
I'm pretty sure it _does_ use noflush all the time.

But the point is that the map method shouldn't be called while the
multipath device is suspended.

I already provided fixes for this, staged here:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.8

and relative to to 4.7:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-4.7-mpath-fixes

With these patches our testing on real SRP hardware testbed (fast DDN
backend) doesn't see any IO errors.

But I'll revisit must_push_back relative to dm_noflush_suspending();
specifically the new must_push_back_rq() could be made to not check
dm_noflush_suspending().




More information about the dm-devel mailing list