[dm-devel] [PATCH] dm-mpath: push back requests instead of queueing

Tue Nov 12 08:49:16 UTC 2013

On 11/12/2013 09:17 AM, Hannes Reinecke wrote:
> On 11/12/2013 08:48 AM, Junichi Nomura wrote:
>> On 11/08/13 18:02, Hannes Reinecke wrote:
>>> There is no reason why multipath needs to queue requests
>>> internally for queue_if_no_path or pg_init; we should
>>> rather push them back onto the request queue.
>>>
>>>
>>> This patch removes the internal queueing mechanism from dm-multipath.
>>
>> Hi Hannes,
>> thanks for working on this.
>>
>>> In doing so we can also remove the ->busy check as a requeue is identical
>>> to ->busy returning 'true' from the callers perspective. This simplifies
>>> the code by quite a lot.
>>
>> They are not identical.
>> ->busy returns true when underlying path cannot dispatch a request
>> immediately. In that case it's better to keep the request in queue
>> to allow merges.  (It used to have real impact on buffered sequential
>> write + fsync workload, though the performance impact might become
>> smaller in recent kernels due to plugging change.)
>> Also ->busy can be used by upper layer (dm_table_any_busy_target).
>> So you can't just remove it.
>>
> But they are identical from the callers perspective:
> drivers/md/dm.c:dm_request_fn()
> 
> 		if (ti->type->busy && ti->type->busy(ti))
> 			goto delay_and_out;
> 
> 		clone = dm_start_request(md, rq);
> 
> 		spin_unlock(q->queue_lock);
> 		if (map_request(ti, clone, md))
> 			goto requeued;
> 
> 		BUG_ON(!irqs_disabled());
> 		spin_lock(q->queue_lock);
> 	}
> 
> 	goto out;
> 
> requeued:
> 	BUG_ON(!irqs_disabled());
> 	spin_lock(q->queue_lock);
> 
> delay_and_out:
> 	blk_delay_queue(q, HZ / 10);
> 
> So the only difference between ->busy and return 'DM_MAPIO_REQUEUE'
> is that 'busy' runs under the queue_lock.
> 
> And yes, in theory ->busy might be used from any upper layer; but
> so far the only caller I've found _is_ in dm_request_fn().
> So removing doesn't do any harm.
> 
> Unless I've misread something ...
> 
>>> @@ -385,21 +378,15 @@ static int map_io(struct multipath *m, struct request *clone,
>>>  	    (!m->queue_io && (m->repeat_count && --m->repeat_count == 0)))
>>>  		__choose_pgpath(m, nr_bytes);
>>>  
>>> -	pgpath = m->current_pgpath;
>>> -
>>> -	if (was_queued)
>>> -		m->queue_size--;
>>> +	if (m->current_pgpath &&
>>> +	    m->pg_init_required && !m->pg_init_in_progress)
>>> +		__pg_init_all_paths(m);
>>>  
>>> +	pgpath = m->current_pgpath;
>>>  	if ((pgpath && m->queue_io) ||
>>>  	    (!pgpath && m->queue_if_no_path)) {
>>> -		/* Queue for the daemon to resubmit */
>>> -		list_add_tail(&clone->queuelist, &m->queued_ios);
>>> -		m->queue_size++;
>>> -		if ((m->pg_init_required && !m->pg_init_in_progress) ||
>>> -		    !m->queue_io)
>>> -			queue_work(kmultipathd, &m->process_queued_ios);
>>>  		pgpath = NULL;
>>> -		r = DM_MAPIO_SUBMITTED;
>>> +		r = DM_MAPIO_REQUEUE;
>>
>> if/else sequence in map_io() might be easier to read if we do:
>>
>> 	if (pgpath) {
>> 		if (pg_ready(m)) {
>> 			... // remap
>> 			r = DM_MAPIO_REMAPPED;
>> 		} else {
>> 			__pg_init_all_paths(m);
>> 			r = DM_MAPIO_REQUEUE;
>> 		}
>> 	} else { /* no path */
>> 		if (need_requeue(m))
>> 			r = DM_MAPIO_REQUEUE;
>> 		else
>> 			r = -EIO;
>> 	}
>>
>> where:
>>   #define pg_ready(m) (!m->queue_io) /* or rename 'queue_io' */
>>   #define need_requeue(m) (m->queue_if_no_path || __must_push_back(m))
>>
>> and move pg_init_required, etc. checks to __pg_init_all_paths().
>>
> True. Will be doing that.
> 
>>> @@ -1241,7 +1168,8 @@ static void pg_init_done(void *data, int errors)
>>>  		m->queue_io = 0;
>>>  
>>>  	m->pg_init_delay_retry = delay_retry;
>>> -	queue_work(kmultipathd, &m->process_queued_ios);
>>> +	if (!m->queue_io)
>>> +		dm_table_run_queue(m->ti->table);
>>>  
>>>  	/*
>>>  	 * Wake up any thread waiting to suspend.
>>
>> process_queued_ios was responsible for retrying pg_init.
>> And when retrying, m->queue_io is still 0.
>> So don't we have to run queue unconditionally here
>> or call __pg_init_all_paths() directly?
>>
> In my rework I've _tried_ to separate both functions from
> process_queued_ios().
> But yes, you are right; I haven't considered pg_init_retry.
> Will be updating the patch.
> 
Actually, I have. I just had a closer look at the patch,
and pg_init retry is handled, albeit differently than in
the original.

It now works like this:

->map_io() is called
  -> calls '__switch_pg', which sets 'queue_io'
  -> calls __pg_init_all_paths, which pushes activate_path
     onto a workqueue
  -> returns 'MAPIO_REQUEUE'

-> pg_init_done()
  -> Checks pg_init_required
     -> if non-zero some other I/O already
        kicked off an 'activate_path',
        so nothing to be done from our side
     -> if zero we're calling kicking the queue
        via blk_run_queue

And blk_run_queue() will be calling into ->request_fn,
which will pull requests off the queue.
So on the next request we're calling 'map_io', so the
entire game starts anew, retrying pg_init.

The only thing we're not handling properly is the
'pg_init_delay_retry', as for that we should've
started the queue with a certain delay, which
we currently don't. But that's easily fixable.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)