[dm-devel] [PATCH] dm-mpath: push back requests instead of queueing
Hannes Reinecke
hare at suse.de
Tue Nov 12 08:17:07 UTC 2013
On 11/12/2013 08:48 AM, Junichi Nomura wrote:
> On 11/08/13 18:02, Hannes Reinecke wrote:
>> There is no reason why multipath needs to queue requests
>> internally for queue_if_no_path or pg_init; we should
>> rather push them back onto the request queue.
>>
>>
>> This patch removes the internal queueing mechanism from dm-multipath.
>
> Hi Hannes,
> thanks for working on this.
>
>> In doing so we can also remove the ->busy check as a requeue is identical
>> to ->busy returning 'true' from the callers perspective. This simplifies
>> the code by quite a lot.
>
> They are not identical.
> ->busy returns true when underlying path cannot dispatch a request
> immediately. In that case it's better to keep the request in queue
> to allow merges. (It used to have real impact on buffered sequential
> write + fsync workload, though the performance impact might become
> smaller in recent kernels due to plugging change.)
> Also ->busy can be used by upper layer (dm_table_any_busy_target).
> So you can't just remove it.
>
But they are identical from the callers perspective:
drivers/md/dm.c:dm_request_fn()
if (ti->type->busy && ti->type->busy(ti))
goto delay_and_out;
clone = dm_start_request(md, rq);
spin_unlock(q->queue_lock);
if (map_request(ti, clone, md))
goto requeued;
BUG_ON(!irqs_disabled());
spin_lock(q->queue_lock);
}
goto out;
requeued:
BUG_ON(!irqs_disabled());
spin_lock(q->queue_lock);
delay_and_out:
blk_delay_queue(q, HZ / 10);
So the only difference between ->busy and return 'DM_MAPIO_REQUEUE'
is that 'busy' runs under the queue_lock.
And yes, in theory ->busy might be used from any upper layer; but
so far the only caller I've found _is_ in dm_request_fn().
So removing doesn't do any harm.
Unless I've misread something ...
>> @@ -385,21 +378,15 @@ static int map_io(struct multipath *m, struct request *clone,
>> (!m->queue_io && (m->repeat_count && --m->repeat_count == 0)))
>> __choose_pgpath(m, nr_bytes);
>>
>> - pgpath = m->current_pgpath;
>> -
>> - if (was_queued)
>> - m->queue_size--;
>> + if (m->current_pgpath &&
>> + m->pg_init_required && !m->pg_init_in_progress)
>> + __pg_init_all_paths(m);
>>
>> + pgpath = m->current_pgpath;
>> if ((pgpath && m->queue_io) ||
>> (!pgpath && m->queue_if_no_path)) {
>> - /* Queue for the daemon to resubmit */
>> - list_add_tail(&clone->queuelist, &m->queued_ios);
>> - m->queue_size++;
>> - if ((m->pg_init_required && !m->pg_init_in_progress) ||
>> - !m->queue_io)
>> - queue_work(kmultipathd, &m->process_queued_ios);
>> pgpath = NULL;
>> - r = DM_MAPIO_SUBMITTED;
>> + r = DM_MAPIO_REQUEUE;
>
> if/else sequence in map_io() might be easier to read if we do:
>
> if (pgpath) {
> if (pg_ready(m)) {
> ... // remap
> r = DM_MAPIO_REMAPPED;
> } else {
> __pg_init_all_paths(m);
> r = DM_MAPIO_REQUEUE;
> }
> } else { /* no path */
> if (need_requeue(m))
> r = DM_MAPIO_REQUEUE;
> else
> r = -EIO;
> }
>
> where:
> #define pg_ready(m) (!m->queue_io) /* or rename 'queue_io' */
> #define need_requeue(m) (m->queue_if_no_path || __must_push_back(m))
>
> and move pg_init_required, etc. checks to __pg_init_all_paths().
>
True. Will be doing that.
>> @@ -1241,7 +1168,8 @@ static void pg_init_done(void *data, int errors)
>> m->queue_io = 0;
>>
>> m->pg_init_delay_retry = delay_retry;
>> - queue_work(kmultipathd, &m->process_queued_ios);
>> + if (!m->queue_io)
>> + dm_table_run_queue(m->ti->table);
>>
>> /*
>> * Wake up any thread waiting to suspend.
>
> process_queued_ios was responsible for retrying pg_init.
> And when retrying, m->queue_io is still 0.
> So don't we have to run queue unconditionally here
> or call __pg_init_all_paths() directly?
>
In my rework I've _tried_ to separate both functions from
process_queued_ios().
But yes, you are right; I haven't considered pg_init_retry.
Will be updating the patch.
>
>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>> index b3e26c7..20a19bd 100644
>> --- a/drivers/md/dm.c
>> +++ b/drivers/md/dm.c
>> @@ -1876,6 +1876,19 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
>> return r;
>> }
>>
>> +void dm_table_run_queue(struct dm_table *t)
>> +{
>> + struct mapped_device *md = dm_table_get_md(t);
>> + unsigned long flags;
>> +
>> + if (md->queue) {
>> + spin_lock_irqsave(md->queue->queue_lock, flags);
>> + blk_run_queue_async(md->queue);
>> + spin_unlock_irqrestore(md->queue->queue_lock, flags);
>> + }
>> +}
>> +EXPORT_SYMBOL_GPL(dm_table_run_queue);
>
> Shouldn't this be in dm-table.c?
>
It would be logical.
But as 'struct mapped_device' is internal to dm.c you
would then end-up with something like:
void dm_run_queue(struct mapped_device *md)
and I would need to call it like
dm_run_queue(dm_table_get_md())
which I felt was a bit pointless, as this would
be the _only_ valid calling sequence. Hence
I moved the call to dm_table_get_md() into the
function, even though it meant a possible
layering violation.
Oh, the joys of device-mapper ...
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare at suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
More information about the dm-devel
mailing list