[dm-devel] [PATCH 1/3] dm: don't grab target io reference in dm_zone_map_bio

Damien Le Moal damien.lemoal at opensource.wdc.com
Tue Apr 12 00:28:46 UTC 2022


On 4/12/22 09:09, Ming Lei wrote:
> On Tue, Apr 12, 2022 at 08:33:04AM +0900, Damien Le Moal wrote:
>> On 4/11/22 23:18, Ming Lei wrote:
>>>>>> This fixes the issue:
>>>>>>
>>>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>>>> index 3c5fad7c4ee6..3dd6735450c5 100644
>>>>>> --- a/drivers/md/dm.c
>>>>>> +++ b/drivers/md/dm.c
>>>>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device
>>>>>> *md, struct bio *bio)
>>>>>>         io->status = 0;
>>>>>>         atomic_set(&io->io_count, 1);
>>>>>>         this_cpu_inc(*md->pending_io);
>>>>>> -       io->orig_bio = NULL;
>>>>>> +       io->orig_bio = bio;
>>>>>>         io->md = md;
>>>>>>         io->map_task = current;
>>>>>>         spin_lock_init(&io->lock);
>>>>>>
>>>>>> Otherwise, the dm-zone.c code sees a NULL orig_bio.
>>>>>> However, this change may be messing up the bio accounting. Need to check that.
>>>>>
>>>>> Looks it is one recent regression since:
>>>>>
>>>>> commit 0fbb4d93b38b ("dm: add dm_submit_bio_remap interface")
>>>>
>>>> Yep, saw that. Problem is, I really do not understand that change setting
>>>> io->orig_bio *after* __map_bio() is called. It seems that the accounting
>>>> is done on each fragment of the orig_bio instead of once for the entire
>>>> BIO... So my "fix" above seems wrong. Apart from passing along orig_bio as
>>>> an argument to  __map_bio() from __split_and_process_bio(), I do not think
>>>> my change is correct. Thoughts ?
>>>
>>> Frankly speaking, both changing ->orig_bio after split and setting ->orig_bio
>>> after ->map() looks ugly & tricky, and the following change should avoid the
>>> issue, meantime simplify dm accounting a bit:
>>>
>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>> index 3c5fad7c4ee6..f1fe83113608 100644
>>> --- a/drivers/md/dm.c
>>> +++ b/drivers/md/dm.c
>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
>>>  	io->status = 0;
>>>  	atomic_set(&io->io_count, 1);
>>>  	this_cpu_inc(*md->pending_io);
>>> -	io->orig_bio = NULL;
>>> +	io->orig_bio = bio;
>>>  	io->md = md;
>>>  	io->map_task = current;
>>>  	spin_lock_init(&io->lock);
>>> @@ -1223,19 +1223,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone)
>>>  	 * Account io->origin_bio to DM dev on behalf of target
>>>  	 * that took ownership of IO with DM_MAPIO_SUBMITTED.
>>>  	 */
>>> -	if (io->map_task == current) {
>>> +	if (io->map_task == current)
>>>  		/* Still in target's map function */
>>>  		dm_io_set_flag(io, DM_IO_START_ACCT);
>>> -	} else {
>>> -		/*
>>> -		 * Called by another thread, managed by DM target,
>>> -		 * wait for dm_split_and_process_bio() to store
>>> -		 * io->orig_bio
>>> -		 */
>>> -		while (unlikely(!smp_load_acquire(&io->orig_bio)))
>>> -			msleep(1);
>>> +	else
>>
>> Curly brackets around the else here.
>>
>>>  		dm_start_io_acct(io, clone);
>>> -	}
>>>  
>>>  	__dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk),
>>>  			      tio->old_sector);
>>> @@ -1562,7 +1554,7 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>>>  				     struct dm_table *map, struct bio *bio)
>>>  {
>>>  	struct clone_info ci;
>>> -	struct bio *orig_bio = NULL;
>>> +	struct bio *new_bio = NULL;
>>>  	int error = 0;
>>>  
>>>  	init_clone_info(&ci, md, map, bio);
>>> @@ -1578,22 +1570,14 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>>>  	if (error || !ci.sector_count)
>>>  		goto out;
>>>  
>>> -	/*
>>> -	 * Remainder must be passed to submit_bio_noacct() so it gets handled
>>> -	 * *after* bios already submitted have been completely processed.
>>> -	 * We take a clone of the original to store in ci.io->orig_bio to be
>>> -	 * used by dm_end_io_acct() and for dm_io_complete() to use for
>>> -	 * completion handling.
>>> -	 */
>>
>> This comment should remain with some adjustment.
> 
> Fine, just felt the approach is very straightforward.
> 
>>
>>> -	orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count,
>>> -			     GFP_NOIO, &md->queue->bio_split);
>>> -	bio_chain(orig_bio, bio);
>>> -	trace_block_split(orig_bio, bio->bi_iter.bi_sector);
>>> -	submit_bio_noacct(bio);
>>> +	new_bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO,
>>> +			&md->queue->bio_split);
>>
>> Why not keep using bio_split() ?
> 
> With bio_split(), 'bio' actually tracks the remainder, and the returned
> 'orig_bio' tracks the part for current target io, so ->orig_bio has to
> be updated in this way.
> 
> With bio_alloc_clone() and bio_trim(), ->orig_bio needn't to be
> changed, and assigning it in alloc_io() works perfectly.

OK. Got it.

>>> +	bio_trim(new_bio, bio_sectors(bio) - ci.sector_count, ci.sector_count);
>>> +	bio_trim(bio, 0, bio_sectors(bio) - ci.sector_count);
>>> +	bio_chain(new_bio, bio);
>>> +	trace_block_split(new_bio, new_bio->bi_iter.bi_sector);
>>> +	submit_bio_noacct(new_bio);
>>>  out:
>>> -	if (!orig_bio)
>>> -		orig_bio = bio;
>>> -	smp_store_release(&ci.io->orig_bio, orig_bio);
>>>  	if (dm_io_flagged(ci.io, DM_IO_START_ACCT))
>>>  		dm_start_io_acct(ci.io, NULL);
>>
>> I tested this and it works. Need to check the accounting though.
>> And I agree this is a lot cleaner :)
> 
> BTW, the cloned bio for split is just for accounting purpose, if
> ->bi_iter.bi_sector & ->bi_iter.bi_size can be stored in 'struct dm_io',
> the cloned bio can be avoided, but code may become not readable as
> before.

The BIO op would be needed too for remapping zone append to regular writes
when zone append emulation is enabled. That is actually why
dm_zone_map_bio() needs the original BIO, to have the unmodified BIO op
since the clone op may not be the same.

So I think this fix+cleanup as is is good for now. Will you send a proper
patch ?

> 
> 
> Thanks,
> Ming
> 


-- 
Damien Le Moal
Western Digital Research



More information about the dm-devel mailing list