[dm-devel] [PATCH] dm: fix free_rq_clone() NULL pointer when requeueing unmapped request

Mike Snitzer snitzer at redhat.com
Wed Apr 29 18:53:45 UTC 2015


On Wed, Apr 29 2015 at  9:34am -0400,
Mike Snitzer <snitzer at redhat.com> wrote:

> On Wed, Apr 29 2015 at  9:20am -0400,
> Christoph Hellwig <hch at lst.de> wrote:
> 
> > On Tue, Apr 28, 2015 at 01:52:20PM +0200, Bart Van Assche wrote:
> > > Hello,
> > >
> > > Earlier today I started testing an SRP initiator patch series on top of 
> > > Linux kernel v4.1-rc1. Although that patch series works reliably on top of 
> > > kernel v4.0, a test during which I triggered scsi_remove_host() + relogin 
> > > (for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait; 
> > > srp_daemon -oaec) triggered the following kernel oops:
> > 
> > Can you try the patch below?  From my cursory reading of the dm code
> > it can have tio->clone allocated for a while before it sets up the ->q
> > pointer for it:
> > 
> > diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> > index f8c7ca3..ee74764 100644
> > --- a/drivers/md/dm.c
> > +++ b/drivers/md/dm.c
> > @@ -1089,7 +1089,7 @@ static void free_rq_clone(struct request *clone)
> >  
> >  	blk_rq_unprep_clone(clone);
> >  
> > -	if (clone->q->mq_ops)
> > +	if (clone->q && clone->q->mq_ops)
> >  		tio->ti->type->release_clone_rq(clone);
> >  	else if (!md->queue->mq_ops)
> >  		/* request_fn queue stacked on request_fn queue(s) */
> 
> I'm seeing this same crash on the completion path (when using your
> tcm_loop script).  But for Bart's case his stacktrace included
> dm_requeue_unmapped_original_request() -- which if called from
> map_request() implies clone->q won't have been initialized given
> __multipath_map()'s code for setting up the old request_fn case.
> 
> Long story short: your fix is right for Bart's crash (but not the ones
> I'm seeing with tcm_loop) -- I'll get it queued up with a proper header
> attributed to you and cc'ing stable as needed.

Actually, here is the proper 4.1-only fix (Bart please verify this works
for you):

From: Mike Snitzer <snitzer at redhat.com>
Date: Wed, 29 Apr 2015 10:48:09 -0400
Subject: dm: fix free_rq_clone() NULL pointer when requeueing unmapped request

Commit 022333427a ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops.  It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2.  However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.

Reported-by: Bart Van Assche <bart.vanassche at sandisk.com>
Signed-off-by: Mike Snitzer <snitzer at redhat.com>
---
 drivers/md/dm.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 3d34b5d..5998c26 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1031,16 +1031,24 @@ static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
 	dm_put(md);
 }
 
-static void free_rq_clone(struct request *clone)
+static void free_rq_clone(struct request *clone, bool must_be_mapped)
 {
 	struct dm_rq_target_io *tio = clone->end_io_data;
 	struct mapped_device *md = tio->md;
 
-	if (clone->q->mq_ops)
+	WARN_ON_ONCE(must_be_mapped && !clone->q);
+
+	if (md->type == DM_TYPE_MQ_REQUEST_BASED)
+		/* stacked on blk-mq queue(s) */
 		tio->ti->type->release_clone_rq(clone);
 	else if (!md->queue->mq_ops)
 		/* request_fn queue stacked on request_fn queue(s) */
 		free_clone_request(md, clone);
+	/*
+	 * NOTE: for the blk-mq queue stacked on request_fn queue(s) case:
+	 * no need to call free_clone_request() because we leverage blk-mq by
+	 * allocating the clone at the end of the blk-mq pdu (see: clone_rq)
+	 */
 
 	if (!md->queue->mq_ops)
 		free_rq_tio(tio);
@@ -1071,7 +1079,7 @@ static void dm_end_request(struct request *clone, int error)
 			rq->sense_len = clone->sense_len;
 	}
 
-	free_rq_clone(clone);
+	free_rq_clone(clone, true);
 	if (!rq->q->mq_ops)
 		blk_end_request_all(rq, error);
 	else
@@ -1090,7 +1098,7 @@ static void dm_unprep_request(struct request *rq)
 	}
 
 	if (clone)
-		free_rq_clone(clone);
+		free_rq_clone(clone, false);
 }
 
 /*
-- 
2.3.2 (Apple Git-55)




More information about the dm-devel mailing list