[dm-devel] [PATCH 1/5] blk-mq: introduce BLK_STS_DEV_RESOURCE

Mon Jan 22 16:49:54 UTC 2018

On Mon, 2018-01-22 at 11:35 +0800, Ming Lei wrote:
> @@ -1280,10 +1282,18 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
>  		 * - Some but not all block drivers stop a queue before
>  		 *   returning BLK_STS_RESOURCE. Two exceptions are scsi-mq
>  		 *   and dm-rq.
> +		 *
> +		 * If drivers return BLK_STS_RESOURCE and S_SCHED_RESTART
> +		 * bit is set, run queue after 10ms for avoiding IO hang
> +		 * because the queue may be idle and the RESTART mechanism
> +		 * can't work any more.
>  		 */
> -		if (!blk_mq_sched_needs_restart(hctx) ||
> +		needs_restart = blk_mq_sched_needs_restart(hctx);
> +		if (!needs_restart ||
>  		    (no_tag && list_empty_careful(&hctx->dispatch_wait.entry)))
>  			blk_mq_run_hw_queue(hctx, true);
> +		else if (needs_restart && (ret == BLK_STS_RESOURCE))
> +			blk_mq_delay_run_hw_queue(hctx, 10);
>  	}

In my opinion there are two problems with the above changes:
* Only the block driver author can know what a good choice is for the time
  after which to rerun the queue. So I think moving the rerun delay (10 ms)
  constant from block drivers into the core is a step backwards instead of a
  step forwards.
* The purpose of the BLK_MQ_S_SCHED_RESTART flag is to detect whether or not
  any of the queue runs triggered by freeing a tag happened concurrently. I
  don't think that there is any relationship between queue runs happening all
  or not concurrently and the chance that driver resources become available.
  So deciding whether or not a queue should be rerun based on the value of
  the BLK_MQ_S_SCHED_RESTART flag seems wrong to me.

> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index d9ca1dfab154..55be2550c555 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2030,9 +2030,9 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	case BLK_STS_OK:
>  		break;
>  	case BLK_STS_RESOURCE:
> -		if (atomic_read(&sdev->device_busy) == 0 &&
> -		    !scsi_device_blocked(sdev))
> -			blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> +		if (atomic_read(&sdev->device_busy) ||
> +		    scsi_device_blocked(sdev))
> +			ret = BLK_STS_DEV_RESOURCE;
>  		break;
>  	default:
>  		/*

The above introduces two changes that have not been mentioned in the
description of this patch:
- The queue rerunning delay is changed from 3 ms into 10 ms. Where is the
  explanation of this change? Does this change have a positive or negative
  performance impact?
- The above modifies a guaranteed queue rerun into a queue rerun that
  may or may not happen, depending on whether or not multiple tags get freed
  concurrently (return BLK_STS_DEV_RESOURCE). Sorry but I think that's wrong.

Bart.