[dm-devel] dm-crypt: Fix deadlock when algo returns -EBUSY

Mike Snitzer snitzer at redhat.com
Tue Apr 7 15:55:01 UTC 2015


On Fri, Apr 03 2015 at 12:09pm -0400,
Ben Collins <ben.c at servergy.com> wrote:

> I suspect this doesn't show up for most anyone because software
> algorithms typically don't have a sense of being too busy. However,
> when working with the Freescale CAAM driver, it would, under heavy
> load, return -EBUSY on occasion, which would leave dm-crypt dead
> locked and quite unhappy.
> 
> After checking the logic in some other drivers, the scheme for
> crypt_convert() and it's callback, kcryptd_async_done() were not
> correctly laid out to handle -EBUSY properly (nor -EINPROGRESS for
> that matter).
> 
> This patch corrects that and makes things pretty happy. Where-as I
> could cause a lockup withing 1-2 minutes, I've not been able to run
> the reproduction test with CAAM drivers fully installed for over
> 24 hours now. I've regression tested it against software algorithms
> on PPC32 and x86_64, and things seem perfectly happy there as well.
> 
> I've tested this all the way back to 3.13, and a few iterations
> in-between. Please consider pushing this to various stable trees.
> 
> Cc: dm-devel at redhat.com
> Cc: Alasdair Kergon <agk at redhat.com>
> Cc: Mike Snitzer <snitzer at redhat.com>
> Signed-off-by: Ben Collins <ben.c at servergy.com>
> ---
>  drivers/md/dm-crypt.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 713a962..3891f26 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -925,11 +925,10 @@ static int crypt_convert(struct crypt_config *cc,
>  
>  		switch (r) {
>  		/* async */
> +		case -EINPROGRESS:
>  		case -EBUSY:
>  			wait_for_completion(&ctx->restart);
>  			reinit_completion(&ctx->restart);
> -			/* fall through*/
> -		case -EINPROGRESS:
>  			ctx->req = NULL;
>  			ctx->cc_sector++;
>  			continue;
> @@ -1346,10 +1345,8 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
>  	struct dm_crypt_io *io = container_of(ctx, struct dm_crypt_io, ctx);
>  	struct crypt_config *cc = io->cc;
>  
> -	if (error == -EINPROGRESS) {
> -		complete(&ctx->restart);
> +	if (error == -EINPROGRESS)
>  		return;
> -	}
>  
>  	if (!error && cc->iv_gen_ops && cc->iv_gen_ops->post)
>  		error = cc->iv_gen_ops->post(cc, iv_of_dmreq(cc, dmreq), dmreq);
> @@ -1359,13 +1356,17 @@ static void kcryptd_async_done(struct crypto_async_request *async_req,
>  
>  	crypt_free_req(cc, req_of_dmreq(cc, dmreq), io->base_bio);
>  
> -	if (!atomic_dec_and_test(&ctx->cc_pending))
> +	if (!atomic_dec_and_test(&ctx->cc_pending)) {
> +		complete(&ctx->restart);
>  		return;
> +	}
>  
>  	if (bio_data_dir(io->base_bio) == READ)
>  		kcryptd_crypt_read_done(io);
>  	else
>  		kcryptd_crypt_write_io_submit(io, 1);
> +
> +	complete(&ctx->restart);
>  }
>  
>  static void kcryptd_crypt(struct work_struct *work)

It looks like you're _always_ using the completion regardless of whether
crypt_convert() will be waiting (e.g. even if error is 0).

I can see this "working" but it seems less than ideal.  Would it be
better to record the need to use the completion in ctx and then
conditionally call complete()?

Mike




More information about the dm-devel mailing list