[dm-devel] [RFC PATCH v2] dm mpath: add a queue_if_no_path timeout

Junichi Nomura j-nomura at ce.jp.nec.com
Fri Nov 1 01:58:54 UTC 2013


On 10/31/13 23:16, Frank Mayhar wrote:
> On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote:
>> On 10/31/13 03:09, Frank Mayhar wrote:
>>> On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote:
>>>> On Wed, Oct 30 2013 at 11:08am -0400,
>>>> Frank Mayhar <fmayhar at google.com> wrote:
>>>>
>>>>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote:
>>>>>> Any interest in this or should I just table it for >= v3.14?
>>>>>
>>>>> Sorry, I've been busy putting out another fire.  Yes, there's definitely
>>>>> still interest.  I grabbed your revised patch and tested with it.
>>>>> Unfortunately the timeout doesn't actually fire when requests are queued
>>>>> due to queue_if_no_path; IIRC the block request queue timeout logic
>>>>> wasn't triggering.  I planned to look into it more deeply figure out why
>>>>> but I had to spend all last week fixing a nasty race and hadn't gotten
>>>>> back to it yet.
>>>>
>>>> OK, Hannes, any idea why this might be happening?  The patch in question
>>>> is here: https://patchwork.kernel.org/patch/3070391/
>>>
>>> I got to this today and so far the most interesting I see is that the
>>> cloned request that's queued in multipath has no queue associated with
>>> it when it's queued; a printk reveals:
>>>
>>> [  517.610042] map_io: queueing rq ffff8801150e0070 q           (null)
>>>
>>> When it's eventually dequeued, it gets a queue from the destination
>>> device (in the pgpath) via bdev_get_queue().
>>>
>>> Because of this and from just looking at the code, blk_start_request()
>>> (and therefore blk_add_timer()) isn't being called for those requests,
>>> so there's never a chance that the timeout would happen.
>>>
>>> Does this make sense?  Or am I totally off-base?
>>
>> Hi,
>>
>> I haven't checked the above patch in detail but there is a problem;
>> abort_if_no_path() treats "rq" as a clone request, which it isn't.
>> "rq" is an original request.
>>
>> It shouldn't be a correct fix but just for testing purpose, you can try
>> changing:
>>   info = dm_get_rq_mapinfo(rq);
>> to
>>   info = dm_get_rq_mapinfo(rq->special);
>> and see what happens.
> 
> Well, at the moment this is kind of moot since abort_if_no_path() isn't
> being called.  But, regardless, don't we want to time out the clone
> request?  That is, after all, what is being queued in map_io().
> Unfortunately the clones don't appear to be associated with a request
> queue; they're just put on multipath's internal queue.

Hmm, "isn't being called" is strange.
If the clone is in multipath's internal queue, the original
should have been "started" from request queue point of view
and timeout should fire.

As for the "clone or original" question, if you are to use the block timer,
you have to use it for the original request (then perhaps let the handler
find its clone and kill it).
That's because (as you already see) clones are not associated with
request queue when it's queued in multipath internal queue
and when it's associated, it belongs to the lower device's queue.

-- 
Jun'ichi Nomura, NEC Corporation




More information about the dm-devel mailing list