[dm-devel] [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout
Frank Mayhar
fmayhar at google.com
Thu Oct 31 14:16:51 UTC 2013
On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
> On 10/31/13 03:09, Frank Mayhar wrote:
> > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
> >> On Wed, Oct 30 2013 at 11:08am -0400,
> >> Frank Mayhar <fmayhar at google.com> wrote:
> >>
> >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> >>>> Any interest in this or should I just table it for >= v3.14?
> >>>
> >>> Sorry, I've been busy putting out another fire. Yes, there's definitely
> >>> still interest. I grabbed your revised patch and tested with it.
> >>> Unfortunately the timeout doesn't actually fire when requests are queued
> >>> due to queue_if_no_path; IIRC the block request queue timeout logic
> >>> wasn't triggering. I planned to look into it more deeply figure out why
> >>> but I had to spend all last week fixing a nasty race and hadn't gotten
> >>> back to it yet.
> >>
> >> OK, Hannes, any idea why this might be happening? The patch in question
> >> is here: https://patchwork.kernel.org/patch/3070391/
> >
> > I got to this today and so far the most interesting I see is that the
> > cloned request that's queued in multipath has no queue associated with
> > it when it's queued; a printk reveals:
> >
> > [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null)
> >
> > When it's eventually dequeued, it gets a queue from the destination
> > device (in the pgpath) via bdev_get_queue().
> >
> > Because of this and from just looking at the code, blk_start_request()
> > (and therefore blk_add_timer()) isn't being called for those requests,
> > so there's never a chance that the timeout would happen.
> >
> > Does this make sense? Or am I totally off-base?
>
> Hi,
>
> I haven't checked the above patch in detail but there is a problem;
> abort_if_no_path() treats "rq" as a clone request, which it isn't.
> "rq" is an original request.
>
> It shouldn't be a correct fix but just for testing purpose, you can try
> changing:
> info = dm_get_rq_mapinfo(rq);
> to
> info = dm_get_rq_mapinfo(rq->special);
> and see what happens.
Well, at the moment this is kind of moot since abort_if_no_path() isn't
being called. But, regardless, don't we want to time out the clone
request? That is, after all, what is being queued in map_io().
Unfortunately the clones don't appear to be associated with a request
queue; they're just put on multipath's internal queue.
--
Frank Mayhar
310-460-4042
More information about the dm-devel
mailing list