[dm-devel] [PATCH 2/2] block: Avoid deadlocks with bio allocation by stacking drivers

Tejun Heo tj at kernel.org
Sat Sep 8 19:36:41 UTC 2012


(Restoring cc list from the previous discussion.  Please retain the cc
list of the people who discussed in the previous postings.)

On Fri, Sep 07, 2012 at 03:12:53PM -0700, Kent Overstreet wrote:
> But this is tricky and not a generic solution. This patch solves it for
> all users by inverting the previously described technique. We allocate a
> rescuer workqueue for each bio_set, and then in the allocation code if
> there are bios on current->bio_list we would be blocking, we punt them
> to the rescuer workqueue to be submitted.

It would be great if this explanation can be expanded a bit.  Why does
it make the deadlock condition go away?  What are the restrictions -
e.g. using other mempools for additional per-bio data structure
wouldn't work, right?

> +static void punt_bios_to_rescuer(struct bio_set *bs)
> +{
> +	struct bio_list punt, nopunt;
> +	struct bio *bio;
> +
> +	bio_list_init(&punt);
> +	bio_list_init(&nopunt);
> +
> +	while ((bio = bio_list_pop(current->bio_list)))
> +		bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> +
> +	*current->bio_list = nopunt;

Why this is necessary needs explanation and it's done in rather
unusual way.  I suppose the weirdness is from bio_list API
restriction?

> +	spin_lock(&bs->rescue_lock);
> +	bio_list_merge(&bs->rescue_list, &punt);
> +	spin_unlock(&bs->rescue_lock);
> +
> +	queue_work(bs->rescue_workqueue, &bs->rescue_work);
> +}
> +
>  /**
>   * bio_alloc_bioset - allocate a bio for I/O
>   * @gfp_mask:   the GFP_ mask given to the slab allocator
> @@ -317,6 +354,7 @@ EXPORT_SYMBOL(bio_reset);
>   */
>  struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  {
> +	gfp_t saved_gfp = gfp_mask;
>  	unsigned front_pad;
>  	unsigned inline_vecs;
>  	unsigned long idx = BIO_POOL_NONE;
> @@ -334,13 +372,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
>  		front_pad = 0;
>  		inline_vecs = nr_iovecs;
>  	} else {
> +		/*
> +		 * generic_make_request() converts recursion to iteration; this
> +		 * means if we're running beneath it, any bios we allocate and
> +		 * submit will not be submitted (and thus freed) until after we
> +		 * return.
> +		 *
> +		 * This exposes us to a potential deadlock if we allocate
> +		 * multiple bios from the same bio_set() while running
> +		 * underneath generic_make_request(). If we were to allocate
> +		 * multiple bios (say a stacking block driver that was splitting
> +		 * bios), we would deadlock if we exhausted the mempool's
> +		 * reserve.
> +		 *
> +		 * We solve this, and guarantee forward progress, with a rescuer
> +		 * workqueue per bio_set. If we go to allocate and there are
> +		 * bios on current->bio_list, we first try the allocation
> +		 * without __GFP_WAIT; if that fails, we punt those bios we
> +		 * would be blocking to the rescuer workqueue before we retry
> +		 * with the original gfp_flags.
> +		 */
> +
> +		if (current->bio_list && !bio_list_empty(current->bio_list))
> +			gfp_mask &= ~__GFP_WAIT;
> +retry:
>  		p = mempool_alloc(bs->bio_pool, gfp_mask);
>  		front_pad = bs->front_pad;
>  		inline_vecs = BIO_INLINE_VECS;
>  	}

Wouldn't the following be better?

	p = mempool_alloc(bs->bi_pool, gfp_mask);
	if (unlikely(!p) && gfp_mask != saved_gfp) {
		punt_bios_to_rescuer(bs);
		p = mempool_alloc(bs->bi_pool, saved_gfp);
	}

I really hope the usage restriction (don't mix with mempool) for
bioset is clearly documented somewhere appropriate.

Thanks.

-- 
tejun




More information about the dm-devel mailing list