[dm-devel] dm-mpath request merging concerns [was: Re: It's time to put together the schedule]

Tue Feb 24 00:38:16 UTC 2015

On Mon, Feb 23, 2015 at 07:39:00PM -0500, Mike Snitzer wrote:
> On Mon, Feb 23 2015 at  5:14pm -0500,
> Benjamin Marzinski <bmarzins at redhat.com> wrote:
> 
> > On Mon, Feb 23, 2015 at 05:46:37PM -0500, Mike Snitzer wrote:
> > > 
> > > It is blk_queue_bio(), via q->make_request_fn, that is intended to
> > > actually do the merging.  What I'm hearing is that we're only getting
> > > some small amount of merging if:
> > > 1) the 2 path case is used and therefore ->busy hook within
> > >    q->request_fn is not taking the request off the queue, so there is
> > >    more potential for later merging
> > > 2) the 4 path case IFF nr_requests is reduced to induce ->busy, which
> > >    only promoted merging as a side-effect like 1) above
> > > 
> > > The reality is we aren't getting merging where it _should_ be happening
> > > (in blk_queue_bio).  We need to understand why that is.
> > 
> > Huh? I'm confused.  If the merges that are happening (which are more
> > likely if either of those two points you mentioned are true) aren't
> > happening in blk_queue_bio, then where are they happening?
> 
> AFAICT, purely from this discussion and NetApp's BZ, the little merging
> that is seen is happening by the ->lld_busy_fn hook.  See the comment
> block above blk_lld_busy().

Well, that function is what's causing dm_request_fn to stop pulling
requests of the queue, through

                if (ti->type->busy && ti->type->busy(ti))
                        goto delay_and_out;

But all scsi_lld_busy (which is the request that eventually gets called
to that signals that the queue is busy) does is check some flags and
other values. The actual merging code is in blk_queue_bio(). 

>  
> > I thought that the issue is that requests are getting pulled off the
> > multipath device's request queue and placed on the underlying device's
> > request queue too quickly, so that there are no requests on multipth's
> > queue to merge with when blk_queue_bio() is called.  In this case, one
> > solution would involve keeping multipath from removing these requests
> > too quickly when we think that it is likely that another request which
> > can get merged will be added soon. That's what all my ideas have been
> > about.
> > 
> > Do you think something different is happening here? 
> 
> Requests are being pulled from the DM-multipath's queue if
> ->lld_busy_fn() is false.  Too quickly is all relative.  The case NetApp
> reported is with SSD devices in the backend.  Any increased idling in
> the interest of merging could hurt latency; but the merging may improve
> IOPS.  So it is trade-off.

I'm not at all sure that there's going to be a one-size-fits-all
solution, and it is possible that for really fast devices, load balancing
may end up being not all that useful.

> So what I said before and am still saying is: we need to understand why
> the designed hook for merging, via q->make_request_fn's blk_queue_bio(),
> isn't actually meaningful for DM multipath.
> 
> Merging should happen _before_ q->request_fn() is called.  Not as a
> side-effect of q->request_fn() happening to have intelligence to not
> start the request because the underlying device queues are busy.

The merging is happening before dm_request_fn, if there are any requests
to actually merge with. If blk_queue_bio runs, and there are no requests
left in the queue for the multipath deivce, then there is no chance
of any merging happening, since there are no requests to merge with. The
issue is that when there are multiple really fast paths under multipath,
their queue never fills up and they always report that they aren't busy,
which means the only thing that device-mapper has to do to the requests
on its queue, is put them on the appropriate queue of the underlying
device.  This doesn't take much time, and once it does this, no merging
is done on the underlying device queues. So if the requests spend more
of their time on the scsi device queues (where no merging happens) and
very little of their time on the multipath queue, then there simply
isn't time for merging to happen.  Merging in the underlying device
queues won't really help matters, since multipath will be spreading out
the requests among the various queues, so that contiguous requests won't
often be sent to the same underlying device (that's the whole point of
request-based multipath: doing the merging first, and then sending down
fully merged requests).

What Netapp was seeing was single requests getting added to the
multipath device queue, and then getting pulled off and added to the
underlying device queue before another request could get added to the
multipath request queue.

While I'm pretty sure that this is what's happening, I agree that making
dm_request_fn quit early may not be the best solution.  I'm not sure why
the queue is getting unplugged so quickly in the first place.  Perhaps
we should understand that first. If we're not calling dm_request_fn so
quickly, then we don't need to worry so much about stopping early.

-Ben