[dm-devel] dm mpath: switch IO scheduler of underlying paths to "none" [was: Re: BFQ + dm-mpath]

Mike Snitzer snitzer at redhat.com
Fri Sep 8 17:07:28 UTC 2017


On Fri, Sep 08 2017 at 12:48pm -0400,
Jens Axboe <axboe at kernel.dk> wrote:

> On 09/08/2017 10:41 AM, Mike Snitzer wrote:
> > On Fri, Sep 08 2017 at  5:13P -0400,
> > Paolo Valente <paolo.valente at linaro.org> wrote:
> > 
> >>
> >>> Il giorno 07 set 2017, alle ore 17:52, Mike Snitzer <snitzer at redhat.com> ha scritto:
> >>>
> >>> On Tue, Sep 05 2017 at 10:15am -0400,
> >>> Bart Van Assche <Bart.VanAssche at wdc.com> wrote:
> >>>
> >>>> On Tue, 2017-09-05 at 09:56 +0200, Paolo Valente wrote:
> >>>>> Ok, my suspects seem confirmed: the path dm_mq_queue_rq -> map_request
> >>>>> -> setup_clone -> blk_rq_prep_clone creates a cloned request without
> >>>>> invoking e->type->ops.mq.prepare_request for the target elevator e.
> >>>>> The cloned request is therefore not initialized for the scheduler, but
> >>>>> it is however inserted into the scheduler by
> >>>>> blk_mq_sched_insert_request.  This seems an error for any scheduler
> >>>>> that needs to initialize fields in the incoming request, or in general
> >>>>> to take some preliminary action.
> >>>>>
> >>>>> Am I missing something here?
> >>>>
> >>>> (+Mike Snitzer)
> >>>>
> >>>> Mike, do you perhaps have the time to look into this memory leak?
> >>>
> >>> It isn't a memory leak, it is missing initialization in the case of
> >>> cloned requests (if I'm understanding Paolo correctly).
> >>>
> >>
> >> Exactly!
> >>
> >>> But cloned requests shouldn't be going through the scheduler.  Only the
> >>> original requests should.
> >>>
> >>> Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO
> >>> schedulers") switched from blk_mq_insert_request() to
> >>> blk_mq_sched_insert_request() and in doing so it opened dm-mpath up to
> >>> this bug.
> >>>
> >>> Could be we need to take steps to ensure the block layer still
> >>> supports bypassing the elevator by using direct insertion?
> >>>
> >>> Or blk_mq_sched_insert_request() needs updating to check if
> >>> e->type->ops.mq.prepare_request were actually performed and to fallback
> >>> to the !elevator case if not..
> >>>
> >>> Not sure on the fix, but I can look closer if others (like Jens or
> >>> Paolo) don't have quicker suggestions.
> >>>
> >>
> >> No quick suggestion from me :(
> >>
> >> Thanks for analyzing this bug,
> > 
> > Please see the following untested patch.  All
> > testing/review/comments/acks appreciated.
> > 
> > I elected to use elevator_change() rather than fiddle with adding a new
> > blk-mq elevator hook (e.g. ->request_prepared) to verify that each
> > blk-mq elevator enabled request did in fact get prepared.
> > 
> > Bart, please test this patch and reply with your review/feedback.
> > 
> > Jens, if you're OK with this solution please reply with your Ack and
> > I'll send it to Linus along with the rest of the handful of DM changes I
> > have for 4.14.
> 
> I am not - we used to have this elevator change functionality from
> inside the kernel, and finally got rid of it when certain drivers killed
> it. I don't want to be bringing it back.

Fine.

> Sounds like we have two issues here. One is that we run into issues with
> stacking IO schedulers, and the other is that we'd rather not have
> multiple schedulers in play for a stacked setup.
> 
> Maybe it'd be cleaner to have the dm-mq side of things not insert
> through the scheduler, but rather just FIFO on the target end?

That was how blk_insert_cloned_request() was before.  From the patch
header (you may have missed it):

"Commit bd166ef18 ("blk-mq-sched: add framework for MQ capable IO
schedulers") switched blk_insert_cloned_request() from using
blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
incorrectly added elevator machinery into a call chain that isn't
supposed to have any."

So shouldn't blk_insert_cloned_request() be made to _not_ use
blk_mq_sched_insert_request()?  We'd need a new block interface
established, or equivalent open-coded in blk_insert_cloned_request(),
to handle direct dispatch to an mq request_queue's queue(s).

Mike




More information about the dm-devel mailing list