[dm-devel] [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout

Frank Mayhar fmayhar at google.com
Thu Oct 31 14:16:51 UTC 2013


On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
> On 10/31/13 03:09, Frank Mayhar wrote:
> > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
> >> On Wed, Oct 30 2013 at 11:08am -0400,
> >> Frank Mayhar <fmayhar at google.com> wrote:
> >>
> >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
> >>>> Any interest in this or should I just table it for >= v3.14?
> >>>
> >>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
> >>> still interest.  I grabbed your revised patch and tested with it.
> >>> Unfortunately the timeout doesn't actually fire when requests are queued
> >>> due to queue_if_no_path; IIRC the block request queue timeout logic
> >>> wasn't triggering.  I planned to look into it more deeply figure out why
> >>> but I had to spend all last week fixing a nasty race and hadn't gotten
> >>> back to it yet.
> >>
> >> OK, Hannes, any idea why this might be happening?  The patch in question
> >> is here: https://patchwork.kernel.org/patch/3070391/
> > 
> > I got to this today and so far the most interesting I see is that the
> > cloned request that's queued in multipath has no queue associated with
> > it when it's queued; a printk reveals:
> > 
> > [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
> > 
> > When it's eventually dequeued, it gets a queue from the destination
> > device (in the pgpath) via bdev_get_queue().
> > 
> > Because of this and from just looking at the code, blk_start_request()
> > (and therefore blk_add_timer()) isn't being called for those requests,
> > so there's never a chance that the timeout would happen.
> > 
> > Does this make sense?  Or am I totally off-base?
> 
> Hi,
> 
> I haven't checked the above patch in detail but there is a problem;
> abort_if_no_path() treats "rq" as a clone request, which it isn't.
> "rq" is an original request.
> 
> It shouldn't be a correct fix but just for testing purpose, you can try
> changing:
>   info = dm_get_rq_mapinfo(rq);
> to
>   info = dm_get_rq_mapinfo(rq->special);
> and see what happens.

Well, at the moment this is kind of moot since abort_if_no_path() isn't
being called.  But, regardless, don't we want to time out the clone
request?  That is, after all, what is being queued in map_io().
Unfortunately the clones don't appear to be associated with a request
queue; they're just put on multipath's internal queue.
-- 
Frank Mayhar
310-460-4042




More information about the dm-devel mailing list