[dm-devel] blk-mq request allocation stalls [was: Re: [PATCH v3 0/8] dm: add request-based blk-mq support]

Wed Jan 7 17:35:30 UTC 2015

On 01/07/2015 10:18 AM, Mike Snitzer wrote:
> On Wed, Jan 07 2015 at 11:24am -0500,
> Jens Axboe <axboe at kernel.dk> wrote:
>
>> On 01/07/2015 09:22 AM, Mike Snitzer wrote:
>>> On Wed, Jan 07 2015 at 11:15am -0500,
>>> Mike Snitzer <snitzer at redhat.com> wrote:
>>>
>>>> On Wed, Jan 07 2015 at 10:32am -0500,
>>>> Jens Axboe <axboe at kernel.dk> wrote:
>>>>
>>>>> You forgot dm-1, that's what mkfs is sleeping on. But given that
>>>>> sdc/sdd look idle, it still looks like a case of dm-1 not
>>>>> appropriately running the queues after insertion.
>>>>
>>>> DM never directly runs the queues of the underlying SCSI devices
>>>> (e.g. sdc, sdd).
>>>>
>>>> request-based DM runs the DM device's queue on completion of a clone
>>>> request:
>>>>
>>>> dm_end_request -> rq_completed -> blk_run_queue_async
>>>>
>>>> Which ultimately does seem to be the wrong way around (like you say:
>>>> queue should run after insertion).
>>>
>>> Hmm, for q->mq_ops blk_insert_cloned_request() should already be running
>>> the queue.
>>>
>>> blk_insert_cloned_request is calling blk_mq_insert_request(rq, false, true, true);
>>>
>>> Third arg being @run_queue which results in blk_mq_run_hw_queue() being
>>> called.
>>
>> OK, that should be fine then. In that case, it's probably a missing
>> queue run in some other condition... Which does make more sense,
>> since "most" of the runs Bart did looked fine, it's just a slow one
>> every now and then.
>
> The one case that looks suspect (missing queue run) is if/when the
> multipath target's mapping function returns DM_MAPIO_REQUEUE.  It only
> returns this if memory allocation failed (e.g. blk_get_request returns
> ENOMEM).  Not seeing why DM core wouldn't want to re-run the DM device's
> queue in this case given it just called blk_requeue_request().  But that
> seems like something I need to revisit and not the ultimate problem.
>
> Looking closer, why not have blk_run_queue() (and blk_run_queue_async)
> call blk_mq_start_stopped_hw_queues() if q->mq_ops?  As is
> scsi_run_queue() open-codes it.
>
> Actually, that is likely the ultimate problem: blk_run_queue* aren't
> trained for q->mq_ops.  As such DM would need to open code a call to
> blk_mq_start_stopped_hw_queues.

It's not completely parallel how SCSI uses it. blk_run_queue(), for 
legacy request_fn, does not start stopped queues. For the mq path, 
scsi-mq decided to do that. So if we embed 
blk_mq_start_stopped_hw_queues() in blk_run_queue*(), then we'd have 
different behavior between mq and non-mq. We could have 
blk_start_queue() do the right thing, but that would require different 
contexts between mq and non-mq, as non-mq requires that to be called 
with the queue lock held and ints disabled.

-- 
Jens Axboe