[dm-devel] [PATCH] dm-mpath: push back requests instead of queueing

Tue Nov 12 08:17:07 UTC 2013

On 11/12/2013 08:48 AM, Junichi Nomura wrote:
> On 11/08/13 18:02, Hannes Reinecke wrote:
>> There is no reason why multipath needs to queue requests
>> internally for queue_if_no_path or pg_init; we should
>> rather push them back onto the request queue.
>>
>>
>> This patch removes the internal queueing mechanism from dm-multipath.
> 
> Hi Hannes,
> thanks for working on this.
> 
>> In doing so we can also remove the ->busy check as a requeue is identical
>> to ->busy returning 'true' from the callers perspective. This simplifies
>> the code by quite a lot.
> 
> They are not identical.
> ->busy returns true when underlying path cannot dispatch a request
> immediately. In that case it's better to keep the request in queue
> to allow merges.  (It used to have real impact on buffered sequential
> write + fsync workload, though the performance impact might become
> smaller in recent kernels due to plugging change.)
> Also ->busy can be used by upper layer (dm_table_any_busy_target).
> So you can't just remove it.
> 
But they are identical from the callers perspective:
drivers/md/dm.c:dm_request_fn()

		if (ti->type->busy && ti->type->busy(ti))
			goto delay_and_out;

		clone = dm_start_request(md, rq);

		spin_unlock(q->queue_lock);
		if (map_request(ti, clone, md))
			goto requeued;

		BUG_ON(!irqs_disabled());
		spin_lock(q->queue_lock);
	}

	goto out;

requeued:
	BUG_ON(!irqs_disabled());
	spin_lock(q->queue_lock);

delay_and_out:
	blk_delay_queue(q, HZ / 10);

So the only difference between ->busy and return 'DM_MAPIO_REQUEUE'
is that 'busy' runs under the queue_lock.

And yes, in theory ->busy might be used from any upper layer; but
so far the only caller I've found _is_ in dm_request_fn().
So removing doesn't do any harm.

Unless I've misread something ...

>> @@ -385,21 +378,15 @@ static int map_io(struct multipath *m, struct request *clone,
>>  	    (!m->queue_io && (m->repeat_count && --m->repeat_count == 0)))
>>  		__choose_pgpath(m, nr_bytes);
>>  
>> -	pgpath = m->current_pgpath;
>> -
>> -	if (was_queued)
>> -		m->queue_size--;
>> +	if (m->current_pgpath &&
>> +	    m->pg_init_required && !m->pg_init_in_progress)
>> +		__pg_init_all_paths(m);
>>  
>> +	pgpath = m->current_pgpath;
>>  	if ((pgpath && m->queue_io) ||
>>  	    (!pgpath && m->queue_if_no_path)) {
>> -		/* Queue for the daemon to resubmit */
>> -		list_add_tail(&clone->queuelist, &m->queued_ios);
>> -		m->queue_size++;
>> -		if ((m->pg_init_required && !m->pg_init_in_progress) ||
>> -		    !m->queue_io)
>> -			queue_work(kmultipathd, &m->process_queued_ios);
>>  		pgpath = NULL;
>> -		r = DM_MAPIO_SUBMITTED;
>> +		r = DM_MAPIO_REQUEUE;
> 
> if/else sequence in map_io() might be easier to read if we do:
> 
> 	if (pgpath) {
> 		if (pg_ready(m)) {
> 			... // remap
> 			r = DM_MAPIO_REMAPPED;
> 		} else {
> 			__pg_init_all_paths(m);
> 			r = DM_MAPIO_REQUEUE;
> 		}
> 	} else { /* no path */
> 		if (need_requeue(m))
> 			r = DM_MAPIO_REQUEUE;
> 		else
> 			r = -EIO;
> 	}
> 
> where:
>   #define pg_ready(m) (!m->queue_io) /* or rename 'queue_io' */
>   #define need_requeue(m) (m->queue_if_no_path || __must_push_back(m))
> 
> and move pg_init_required, etc. checks to __pg_init_all_paths().
> 
True. Will be doing that.

>> @@ -1241,7 +1168,8 @@ static void pg_init_done(void *data, int errors)
>>  		m->queue_io = 0;
>>  
>>  	m->pg_init_delay_retry = delay_retry;
>> -	queue_work(kmultipathd, &m->process_queued_ios);
>> +	if (!m->queue_io)
>> +		dm_table_run_queue(m->ti->table);
>>  
>>  	/*
>>  	 * Wake up any thread waiting to suspend.
> 
> process_queued_ios was responsible for retrying pg_init.
> And when retrying, m->queue_io is still 0.
> So don't we have to run queue unconditionally here
> or call __pg_init_all_paths() directly?
> 
In my rework I've _tried_ to separate both functions from
process_queued_ios().
But yes, you are right; I haven't considered pg_init_retry.
Will be updating the patch.

> 
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index b3e26c7..20a19bd 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1876,6 +1876,19 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
>>  	return r;
>>  }
>>  
>> +void dm_table_run_queue(struct dm_table *t)
>> +{
>> +	struct mapped_device *md = dm_table_get_md(t);
>> +	unsigned long flags;
>> +
>> +	if (md->queue) {
>> +		spin_lock_irqsave(md->queue->queue_lock, flags);
>> +		blk_run_queue_async(md->queue);
>> +		spin_unlock_irqrestore(md->queue->queue_lock, flags);
>> +	}
>> +}
>> +EXPORT_SYMBOL_GPL(dm_table_run_queue);
> 
> Shouldn't this be in dm-table.c?
> 
It would be logical.
But as 'struct mapped_device' is internal to dm.c you
would then end-up with something like:

void dm_run_queue(struct mapped_device *md)

and I would need to call it like

dm_run_queue(dm_table_get_md())

which I felt was a bit pointless, as this would
be the _only_ valid calling sequence. Hence
I moved the call to dm_table_get_md() into the
function, even though it meant a possible
layering violation.

Oh, the joys of device-mapper ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare at suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)