[dm-devel] [PATCH v2] block: directly insert blk-mq request from blk_insert_cloned_request()
Jens Axboe
axboe at kernel.dk
Thu Sep 14 16:34:36 UTC 2017
On 09/14/2017 10:33 AM, Ming Lei wrote:
> On Fri, Sep 15, 2017 at 12:30 AM, Jens Axboe <axboe at kernel.dk> wrote:
>> On 09/14/2017 09:57 AM, Ming Lei wrote:
>>> On Tue, Sep 12, 2017 at 5:27 AM, Jens Axboe <axboe at kernel.dk> wrote:
>>>> On 09/11/2017 03:13 PM, Mike Snitzer wrote:
>>>>> On Mon, Sep 11 2017 at 4:51pm -0400,
>>>>> Jens Axboe <axboe at kernel.dk> wrote:
>>>>>
>>>>>> On 09/11/2017 10:16 AM, Mike Snitzer wrote:
>>>>>>> Here is v2 that should obviate the need to rename blk_mq_insert_request
>>>>>>> (by using bools to control run_queue and async).
>>>>>>>
>>>>>>> As for inserting directly into dispatch, if that can be done that is
>>>>>>> great but I'd prefer to have that be a follow-up optimization. This
>>>>>>> fixes the regression in question, and does so in well-known terms.
>>>>>>>
>>>>>>> What do you think?
>>>>>>
>>>>>> I think it looks reasonable. My only concern is the use of the software
>>>>>> queues. Depending on the scheduler, they may or may not be used. I'd
>>>>>> need to review the code, but my first thought is that this would break
>>>>>> if you use blk_mq_insert_request() on a device that is managed by
>>>>>> mq-deadline or bfq, for instance. Schedulers are free to use the
>>>>>> software queues, but they are also free to ignore them and use internal
>>>>>> queuing.
>>>>>>
>>>>>> Looking at the code, looks like this was changed slightly at some point,
>>>>>> we always flush the software queues, if any of them contain requests. So
>>>>>> it's probably fine.
>>>>>
>>>>> OK good, but is that too brittle to rely on? Something that might change
>>>>> in the future?
>>>>
>>>> I'm actually surprised we do flush software queues for that case, since
>>>> we don't always have to. So it is a bit of a wart. If we don't have a
>>>> scheduler, software queues is where IO goes. If we have a scheduler, the
>>>> scheduler has complete control of where to queue IO. Generally, the
>>>> scheduler will either utilize the software queues or it won't, there's
>>>> nothing in between.
>>>>
>>>> I know realize I'm an idiot and didn't read it right. So here's the code
>>>> in question:
>>>>
>>>> const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request;
>>>>
>>>> [...]
>>>>
>>>> } else if (!has_sched_dispatch) {
>>>> blk_mq_flush_busy_ctxs(hctx, &rq_list);
>>>> blk_mq_dispatch_rq_list(q, &rq_list);
>>>> }
>>>>
>>>> so we do only enter sw queue flushing, if we don't have a scheduler with
>>>> a dispatch_request hook. So now I am really wondering how your patch
>>>> could work if the bottom device has bfq or mq-deadline attached?
>>>>
>>>>>> My earlier suggestion to use just hctx->dispatch for the IO and bypass
>>>>>> the software queues completely. The use case for the dispatch list is
>>>>>> the same, regardless of whether the device has a scheduler attached or
>>>>>> not.
>>>>>
>>>>> I'm missing how these details relate to the goal of bypassing any
>>>>> scheduler that might be attached. Are you saying the attached elevator
>>>>> would still get in the way?
>>>>
>>>> See above.
>>>>
>>>>> Looking at blk_mq_sched_insert_request(), submission when an elevator
>>>>> isn't attached is exactly what I made blk_mq_insert_request() do
>>>>> (which is exactly what it did in the past).
>>>>
>>>> Right, but that path is only used if we don't have a scheduler attached.
>>>> So while the code will use that path IFF a scheduler isn't attached to
>>>> that device, your use case will use it for both cases.
>>>>
>>>>> In the case of DM multipath, nothing else should be submitting IO to
>>>>> the device so elevator shouldn't be used -- only interface for
>>>>> submitting IO would be blk_mq_insert_request(). So even if a
>>>>> scheduler is attached it should be bypassed right?
>>>>
>>>> The problem is the usage of the sw queue.
>>>>
>>>> Does the below work for you?
>>>>
>>>>
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index d709c0e3a2ac..aebe676225e6 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -2342,7 +2342,12 @@ blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *
>>>> if (q->mq_ops) {
>>>> if (blk_queue_io_stat(q))
>>>> blk_account_io_start(rq, true);
>>>> - blk_mq_sched_insert_request(rq, false, true, false, false);
>>>> + /*
>>>> + * Since we have a scheduler attached on the top device,
>>>> + * bypass a potential scheduler on the bottom device for
>>>> + * insert.
>>>> + */
>>>> + blk_mq_request_bypass_insert(rq);
>>>> return BLK_STS_OK;
>>>> }
>>>>
>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>> index 3f18cff80050..98a18609755e 100644
>>>> --- a/block/blk-mq.c
>>>> +++ b/block/blk-mq.c
>>>> @@ -1401,6 +1401,22 @@ void __blk_mq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>> blk_mq_hctx_mark_pending(hctx, ctx);
>>>> }
>>>>
>>>> +/*
>>>> + * Should only be used carefully, when the caller knows we want to
>>>> + * bypass a potential IO scheduler on the target device.
>>>> + */
>>>> +void blk_mq_request_bypass_insert(struct request *rq)
>>>> +{
>>>> + struct blk_mq_ctx *ctx = rq->mq_ctx;
>>>> + struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(rq->q, ctx->cpu);
>>>> +
>>>> + spin_lock(&hctx->lock);
>>>> + list_add_tail(&rq->queuelist, &hctx->dispatch);
>>>> + spin_unlock(&hctx->lock);
>>>> +
>>>> + blk_mq_run_hw_queue(hctx, false);
>>>> +}
>>>> +
>>>
>>> Hello Jens and Mike,
>>>
>>> This patch sends flush request to ->dispatch directly too, which changes the
>>> previous behaviour, is that OK for dm-rq?
>>
>> That's a good question, I need to look into that. The flush behavior is so
>> annoying. Did you make any progress on fixing up the patch you posted the
>> other day on treating flushes like any other request?
>
> It has been ready, will post it out later.
OK good, if that's clean enough, then I think going that route is a much
better idea than introducing more flush/not-flush logic. I liked the
initial patch from a concept point of view, and it enables us to get rid
of a few nasty hacks.
--
Jens Axboe
More information about the dm-devel
mailing list