[dm-devel] [PATCH/rfc] dm: revise 'rescue' strategy for md->bs allocations

Mikulas Patocka mpatocka at redhat.com
Wed Sep 6 22:57:47 UTC 2017



On Wed, 6 Sep 2017, NeilBrown wrote:

> Thanks for the extra explanation.  I've thought some more about this I can
> see a way forward that I am comfortable with.  Please let me know what
> you think.
> I haven't tested this patch yet, but I believe that applying it first will
> clear the way for my other patch to work without reintroducing
> deadlocks.
> 
> Thanks,
> NeilBrown

What's the purpose of this patch? If the current code that offloads bios 
on current->bio_list to a rescue thread works, why do you want to change 
it?

Mikulas

> From: NeilBrown <neilb at suse.com>
> Date: Wed, 6 Sep 2017 09:14:52 +1000
> Subject: [PATCH] dm: ensure bio submission follows a depth-first tree walk.
> 
> A dm device can, in general, represent a tree of targets,
> each of which handles a sub-range of the range of blocks handled
> by the parent.
> 
> The bio sequencing managed by generic_make_request() requires that bios
> are generated and handled in a depth-first manner.  Each call to a
> make_request_fn() may submit bios to a single member device, and may
> submit bios for a reduced region of the same device as the
> make_request_fn.
> 
> In particular, any bios submitted to member devices must be expected to
> be processed in order, so a later one must never wait for an earlier
> one.
> 
> This ordering is usually achieved by using bio_split() to reduce a bio
> to a size that can be completely handled by one target, and resubmitting
> the remainder to the originating device. bio_queue_split() shows the canonical
> approach.
> 
> dm doesn't follow this approach, largely because it has needed to split
> bios since long before bio_split() was available.  It currently can
> submit bios to separate targets within the one dm_make_request() call.
> Dependencies between these targets, as can happen with dm-snap, can
> cause deadlocks if either bios gets stuck behind the other in the queues
> managed by generic_make_request().  This requires the 'rescue'
> functionality provided by dm_offload*.
> 
> Some of this requirement can be removed by changing the order of bio
> submission to follow the canonical approach.  That is, if dm finds that
> it needs to split a bio, the remainder should be sent to
> generic_make_request() rather than being handled immediately.  This delays
> the handling until the first part is completely processed, so the
> deadlock problems do not occur.
> 
> __split_and_process_bio() can be called both from dm_make_request() and
> from dm_wq_work().  When called from dm_wq_work() the current approach
> is perfectly satisfactory as each bio will be processed immediately.
> When called from dm_make_request, current->bio_list will be non-NULL,
> and in this case it is best to create a separate "clone" bio for the
> remainder.
> 
> To provide the clone bio when splitting, we use q->bio_split().  This
> was previously being freed to avoid having excess rescuer threads.  As
> bio_split bio sets no longer create rescuer threads, there is little
> cost and much gain from leaving the ->bio_split bio set in place.
> 
> Signed-off-by: NeilBrown <neilb at suse.com>
> ---
>  drivers/md/dm.c | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index d669fddd9290..81395f7550c0 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1007,7 +1007,7 @@ static void dm_dax_flush(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
>  
>  /*
>   * A target may call dm_accept_partial_bio only from the map routine.  It is
> - * allowed for all bio types except REQ_PREFLUSH.
> + * allowed for all bio types except REQ_PREFLUSH and REQ_OP_ZONE_RESET.
>   *
>   * dm_accept_partial_bio informs the dm that the target only wants to process
>   * additional n_sectors sectors of the bio and the rest of the data should be
> @@ -1508,8 +1508,21 @@ static void __split_and_process_bio(struct mapped_device *md,
>  	} else {
>  		ci.bio = bio;
>  		ci.sector_count = bio_sectors(bio);
> -		while (ci.sector_count && !error)
> +		while (ci.sector_count && !error) {
>  			error = __split_and_process_non_flush(&ci);
> +			if (current->bio_list && ci.sector_count && !error) {
> +				/* Remainder must be passed to generic_make_request()
> +				 * so that it gets handled *after* bios already submitted
> +				 * have been completely processed.
> +				 */
> +				struct bio *b = bio_clone_bioset(bio, GFP_NOIO,
> +								 md->queue->bio_split);
> +				bio_advance(b, (bio_sectors(b) - ci.sector_count) << 9);
> +				bio_chain(b, bio);
> +				generic_make_request(b);
> +				break;
> +			}
> +		}
>  	}
>  
>  	/* drop the extra reference count */
> @@ -1520,8 +1533,8 @@ static void __split_and_process_bio(struct mapped_device *md,
>   *---------------------------------------------------------------*/
>  
>  /*
> - * The request function that just remaps the bio built up by
> - * dm_merge_bvec.
> + * The request function that remaps the bio to one target and
> + * splits off any remainder.
>   */
>  static blk_qc_t dm_make_request(struct request_queue *q, struct bio *bio)
>  {
> @@ -2056,12 +2069,6 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
>  	case DM_TYPE_DAX_BIO_BASED:
>  		dm_init_normal_md_queue(md);
>  		blk_queue_make_request(md->queue, dm_make_request);
> -		/*
> -		 * DM handles splitting bios as needed.  Free the bio_split bioset
> -		 * since it won't be used (saves 1 process per bio-based DM device).
> -		 */
> -		bioset_free(md->queue->bio_split);
> -		md->queue->bio_split = NULL;
>  
>  		if (type == DM_TYPE_DAX_BIO_BASED)
>  			queue_flag_set_unlocked(QUEUE_FLAG_DAX, md->queue);
> -- 
> 2.14.0
> 
> 




More information about the dm-devel mailing list