[dm-devel] [PATCH 1/3] block: fix blk_rq_get_max_sectors() to flow more carefully

Tue Sep 15 02:15:23 UTC 2020

On 2020/09/15 11:04, Ming Lei wrote:
> On Mon, Sep 14, 2020 at 12:43:06AM +0000, Damien Le Moal wrote:
>> On 2020/09/12 22:53, Ming Lei wrote:
>>> On Fri, Sep 11, 2020 at 05:53:36PM -0400, Mike Snitzer wrote:
>>>> blk_queue_get_max_sectors() has been trained for REQ_OP_WRITE_SAME and
>>>> REQ_OP_WRITE_ZEROES yet blk_rq_get_max_sectors() didn't call it for
>>>> those operations.
>>>
>>> Actually WRITE_SAME & WRITE_ZEROS are handled by the following if
>>> chunk_sectors is set:
>>>
>>>         return min(blk_max_size_offset(q, offset),
>>>                         blk_queue_get_max_sectors(q, req_op(rq)));
>>>  
>>>> Also, there is no need to avoid blk_max_size_offset() if
>>>> 'chunk_sectors' isn't set because it falls back to 'max_sectors'.
>>>>
>>>> Signed-off-by: Mike Snitzer <snitzer at redhat.com>
>>>> ---
>>>>  include/linux/blkdev.h | 19 +++++++++++++------
>>>>  1 file changed, 13 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>> index bb5636cc17b9..453a3d735d66 100644
>>>> --- a/include/linux/blkdev.h
>>>> +++ b/include/linux/blkdev.h
>>>> @@ -1070,17 +1070,24 @@ static inline unsigned int blk_rq_get_max_sectors(struct request *rq,
>>>>  						  sector_t offset)
>>>>  {
>>>>  	struct request_queue *q = rq->q;
>>>> +	int op;
>>>> +	unsigned int max_sectors;
>>>>  
>>>>  	if (blk_rq_is_passthrough(rq))
>>>>  		return q->limits.max_hw_sectors;
>>>>  
>>>> -	if (!q->limits.chunk_sectors ||
>>>> -	    req_op(rq) == REQ_OP_DISCARD ||
>>>> -	    req_op(rq) == REQ_OP_SECURE_ERASE)
>>>> -		return blk_queue_get_max_sectors(q, req_op(rq));
>>>> +	op = req_op(rq);
>>>> +	max_sectors = blk_queue_get_max_sectors(q, op);
>>>>  
>>>> -	return min(blk_max_size_offset(q, offset),
>>>> -			blk_queue_get_max_sectors(q, req_op(rq)));
>>>> +	switch (op) {
>>>> +	case REQ_OP_DISCARD:
>>>> +	case REQ_OP_SECURE_ERASE:
>>>> +	case REQ_OP_WRITE_SAME:
>>>> +	case REQ_OP_WRITE_ZEROES:
>>>> +		return max_sectors;
>>>> +	}>> +
>>>> +	return min(blk_max_size_offset(q, offset), max_sectors);
>>>>  }
>>>
>>> It depends if offset & chunk_sectors limit for WRITE_SAME & WRITE_ZEROS
>>> needs to be considered.
>>
>> That limit is needed for zoned block devices to ensure that *any* write request,
>> no matter the command, do not cross zone boundaries. Otherwise, the write would
>> be immediately failed by the device.
> 
> Looks both blk_bio_write_zeroes_split() and blk_bio_write_same_split()
> don't consider chunk_sectors limit, is that an issue for zone block?

Hu... Never looked at these. Yes, it will be a problem. write zeroes for NVMe
ZNS drives and write same for SCSI/ZBC drives. So yes, definitely something
that needs to be fixed. User of these will be file systems that in the case of
zoned block devices would be FS with zone support. f2fs does not use these
commands, and btrfs (posted recently) needs to be checked. But the FS itself
being zone aware, the requests will be zone aligned.

But definitely worth fixing.

Thanks !

-- 
Damien Le Moal
Western Digital Research