[dm-devel] dm-mq and end_clone_request()

Mike Snitzer snitzer at redhat.com
Mon Aug 1 19:15:08 UTC 2016


On Mon, Aug 01 2016 at  2:55pm -0400,
Bart Van Assche <bart.vanassche at sandisk.com> wrote:

> On 08/01/2016 10:59 AM, Mike Snitzer wrote:
> >This says to me that must_push_back is returning false because
> >dm_noflush_suspending() is false.  When this happens -EIO will escape up
> >the IO stack.
> >
> >And this confirms that must_push_back() calling dm_noflush_suspending()
> >is quite suspect given queue_if_no_path was configured: we should
> >_always_ pushback if no paths are available.
> >
> >I'll dig deeper on really understanding _why_ must_push_back() is coded
> >like it is.
> 
> Hello Mike,
> 
> Earlier I had reported that I observe this behavior with
> CONFIG_DM_MQ_DEFAULT=y after the first simulated cable pull. I have
> been able to reproduce this behavior with CONFIG_DM_MQ_DEFAULT=n but
> it takes a large number of iterations to trigger this behavior. The
> output that appears on my setup in the kernel log with a bunch of
> printk()'s added in the dm-mpath driver for CONFIG_DM_MQ_DEFAULT=n
> is as follows (mpath 254:0 and /dev/mapper/mpathbe refer to the same
> multipath device):
> 
> [  314.755582] mpath 254:0: queue_if_no_path 0 -> 1
> [  314.770571] executing DM ioctl DEV_SUSPEND on mpathbe
> [  314.770622] mpath 254:0: queue_if_no_path 1 -> 0
> [  314.770657] __multipath_map(): (a) returning -5
> [  314.770657] map_request(): clone_and_map_rq() returned -5
> [  314.770658] dm_complete_request: error = -5

OK, that makes no sense at all (nevermind that your trace really isn't
useful, my debug patch is more so).

The old .request_fn code checks !blk_queue_stopped() in dm_request_fn().
When a dm-mpath device is suspended the queue gets stopped (same goes
for dm-mq).  So once a request-based DM device is suspended the .map_rq
and .clone_and_map_rq hooks shouldn't get called.

BUT the difference between .request_fn and dm-mq is that dm-mq doesn't
ever check if the queue if stopped.  We rely on blk-core to ensure that
IO is not submitted (via dm_mq_queue_rq) while a device is suspended
(and because the queue is stopped it shouldn't).

Now you're saying both cases aren't working.. which is really confusing.
Basically the request-based DM core should protect against requests
being mapped while a device is suspended... seems to be mounting
evidence that isn't the case.




More information about the dm-devel mailing list