[dm-devel] [PATCH 1/3] dm: don't grab target io reference in dm_zone_map_bio

Damien Le Moal damien.lemoal at opensource.wdc.com
Tue Apr 12 01:17:27 UTC 2022


On 4/12/22 10:02, Ming Lei wrote:
> On Tue, Apr 12, 2022 at 09:28:46AM +0900, Damien Le Moal wrote:
>> On 4/12/22 09:09, Ming Lei wrote:
>>> On Tue, Apr 12, 2022 at 08:33:04AM +0900, Damien Le Moal wrote:
>>>> On 4/11/22 23:18, Ming Lei wrote:
>>>>>>>> This fixes the issue:
>>>>>>>>
>>>>>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>>>>>> index 3c5fad7c4ee6..3dd6735450c5 100644
>>>>>>>> --- a/drivers/md/dm.c
>>>>>>>> +++ b/drivers/md/dm.c
>>>>>>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device
>>>>>>>> *md, struct bio *bio)
>>>>>>>>         io->status = 0;
>>>>>>>>         atomic_set(&io->io_count, 1);
>>>>>>>>         this_cpu_inc(*md->pending_io);
>>>>>>>> -       io->orig_bio = NULL;
>>>>>>>> +       io->orig_bio = bio;
>>>>>>>>         io->md = md;
>>>>>>>>         io->map_task = current;
>>>>>>>>         spin_lock_init(&io->lock);
>>>>>>>>
>>>>>>>> Otherwise, the dm-zone.c code sees a NULL orig_bio.
>>>>>>>> However, this change may be messing up the bio accounting. Need to check that.
>>>>>>>
>>>>>>> Looks it is one recent regression since:
>>>>>>>
>>>>>>> commit 0fbb4d93b38b ("dm: add dm_submit_bio_remap interface")
>>>>>>
>>>>>> Yep, saw that. Problem is, I really do not understand that change setting
>>>>>> io->orig_bio *after* __map_bio() is called. It seems that the accounting
>>>>>> is done on each fragment of the orig_bio instead of once for the entire
>>>>>> BIO... So my "fix" above seems wrong. Apart from passing along orig_bio as
>>>>>> an argument to  __map_bio() from __split_and_process_bio(), I do not think
>>>>>> my change is correct. Thoughts ?
>>>>>
>>>>> Frankly speaking, both changing ->orig_bio after split and setting ->orig_bio
>>>>> after ->map() looks ugly & tricky, and the following change should avoid the
>>>>> issue, meantime simplify dm accounting a bit:
>>>>>
>>>>> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
>>>>> index 3c5fad7c4ee6..f1fe83113608 100644
>>>>> --- a/drivers/md/dm.c
>>>>> +++ b/drivers/md/dm.c
>>>>> @@ -581,7 +581,7 @@ static struct dm_io *alloc_io(struct mapped_device *md, struct bio *bio)
>>>>>  	io->status = 0;
>>>>>  	atomic_set(&io->io_count, 1);
>>>>>  	this_cpu_inc(*md->pending_io);
>>>>> -	io->orig_bio = NULL;
>>>>> +	io->orig_bio = bio;
>>>>>  	io->md = md;
>>>>>  	io->map_task = current;
>>>>>  	spin_lock_init(&io->lock);
>>>>> @@ -1223,19 +1223,11 @@ void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone)
>>>>>  	 * Account io->origin_bio to DM dev on behalf of target
>>>>>  	 * that took ownership of IO with DM_MAPIO_SUBMITTED.
>>>>>  	 */
>>>>> -	if (io->map_task == current) {
>>>>> +	if (io->map_task == current)
>>>>>  		/* Still in target's map function */
>>>>>  		dm_io_set_flag(io, DM_IO_START_ACCT);
>>>>> -	} else {
>>>>> -		/*
>>>>> -		 * Called by another thread, managed by DM target,
>>>>> -		 * wait for dm_split_and_process_bio() to store
>>>>> -		 * io->orig_bio
>>>>> -		 */
>>>>> -		while (unlikely(!smp_load_acquire(&io->orig_bio)))
>>>>> -			msleep(1);
>>>>> +	else
>>>>
>>>> Curly brackets around the else here.
>>>>
>>>>>  		dm_start_io_acct(io, clone);
>>>>> -	}
>>>>>  
>>>>>  	__dm_submit_bio_remap(tgt_clone, disk_devt(io->md->disk),
>>>>>  			      tio->old_sector);
>>>>> @@ -1562,7 +1554,7 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>>>>>  				     struct dm_table *map, struct bio *bio)
>>>>>  {
>>>>>  	struct clone_info ci;
>>>>> -	struct bio *orig_bio = NULL;
>>>>> +	struct bio *new_bio = NULL;
>>>>>  	int error = 0;
>>>>>  
>>>>>  	init_clone_info(&ci, md, map, bio);
>>>>> @@ -1578,22 +1570,14 @@ static void dm_split_and_process_bio(struct mapped_device *md,
>>>>>  	if (error || !ci.sector_count)
>>>>>  		goto out;
>>>>>  
>>>>> -	/*
>>>>> -	 * Remainder must be passed to submit_bio_noacct() so it gets handled
>>>>> -	 * *after* bios already submitted have been completely processed.
>>>>> -	 * We take a clone of the original to store in ci.io->orig_bio to be
>>>>> -	 * used by dm_end_io_acct() and for dm_io_complete() to use for
>>>>> -	 * completion handling.
>>>>> -	 */
>>>>
>>>> This comment should remain with some adjustment.
>>>
>>> Fine, just felt the approach is very straightforward.
>>>
>>>>
>>>>> -	orig_bio = bio_split(bio, bio_sectors(bio) - ci.sector_count,
>>>>> -			     GFP_NOIO, &md->queue->bio_split);
>>>>> -	bio_chain(orig_bio, bio);
>>>>> -	trace_block_split(orig_bio, bio->bi_iter.bi_sector);
>>>>> -	submit_bio_noacct(bio);
>>>>> +	new_bio = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOIO,
>>>>> +			&md->queue->bio_split);
>>>>
>>>> Why not keep using bio_split() ?
>>>
>>> With bio_split(), 'bio' actually tracks the remainder, and the returned
>>> 'orig_bio' tracks the part for current target io, so ->orig_bio has to
>>> be updated in this way.
>>>
>>> With bio_alloc_clone() and bio_trim(), ->orig_bio needn't to be
>>> changed, and assigning it in alloc_io() works perfectly.
>>
>> OK. Got it.
>>
>>>>> +	bio_trim(new_bio, bio_sectors(bio) - ci.sector_count, ci.sector_count);
>>>>> +	bio_trim(bio, 0, bio_sectors(bio) - ci.sector_count);
>>>>> +	bio_chain(new_bio, bio);
>>>>> +	trace_block_split(new_bio, new_bio->bi_iter.bi_sector);
>>>>> +	submit_bio_noacct(new_bio);
>>>>>  out:
>>>>> -	if (!orig_bio)
>>>>> -		orig_bio = bio;
>>>>> -	smp_store_release(&ci.io->orig_bio, orig_bio);
>>>>>  	if (dm_io_flagged(ci.io, DM_IO_START_ACCT))
>>>>>  		dm_start_io_acct(ci.io, NULL);
>>>>
>>>> I tested this and it works. Need to check the accounting though.
>>>> And I agree this is a lot cleaner :)
>>>
>>> BTW, the cloned bio for split is just for accounting purpose, if
>>> ->bi_iter.bi_sector & ->bi_iter.bi_size can be stored in 'struct dm_io',
>>> the cloned bio can be avoided, but code may become not readable as
>>> before.
>>
>> The BIO op would be needed too for remapping zone append to regular writes
>> when zone append emulation is enabled. That is actually why
>> dm_zone_map_bio() needs the original BIO, to have the unmodified BIO op
>> since the clone op may not be the same.
> 
> clone inherits the original bio's op. I meant to store the
> real bio from FS as ->orig_bio always in alloc_io(), and simply trim it in case
> of split & re-submission, meantime store orig_bio's ->bi_sector & ->size into
> 'dm_io' for account purpose. But it looks a bit complicated and messy.

Indeed.

> Wrt. dm zone, I'd suggest to double check anywhere orig bio is used,
> since only part of orig bio may be mapped in case of splitting, which is
> actually post-split.
> 
> If bio split won't happen for dm-zone, your patch is fine. But I guess
> it isn't true for dm-zone.

Zone append operations can never be split. That is checked before a user
bio reaches DM __map_bio()/dm_zone_map_bio(). So I think that there is no
issue with the clone+trim change since that will never be done for a zone
append operation.

In the case of emulated zone append, regular writes will still go through
the dm_zone_map_bio() function, and these writes may be split. But then
the orig_bio does not really matter in that case and the zone write
pointer tracking relies on the clone sector & size, not on the original BIO.

I will run more tests with this patch. But so far, this seems all OK.

>> So I think this fix+cleanup as is is good for now. Will you send a proper
>> patch ?
> 
> Not yet, the fix+cleanup patch actually breaks recent dm io polling, and I
> don't figure out one solution yet.

OK. Let me know if you need help testing.


-- 
Damien Le Moal
Western Digital Research



More information about the dm-devel mailing list