[dm-devel] Kernel v4.1-rc1 + MQ dm-multipath + MQ SRP oops

Mike Snitzer snitzer at redhat.com
Tue Apr 28 13:52:58 UTC 2015


On Tue, Apr 28 2015 at  7:52am -0400,
Bart Van Assche <bart.vanassche at sandisk.com> wrote:

> Hello,
> 
> Earlier today I started testing an SRP initiator patch series on top
> of Linux kernel v4.1-rc1. Although that patch series works reliably
> on top of kernel v4.0, a test during which I triggered
> scsi_remove_host() + relogin (for p in
> /sys/class/srp_remote_ports/*; do echo 1 >$p/delete & done; wait;
> srp_daemon -oaec) triggered the following kernel oops:
> 
> device-mapper: multipath: Failing path 8:0.
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000138
> IP: [<ffffffffa045f8e9>] free_rq_clone+0x29/0xb0 [dm_mod]
...
 
> In case anyone wants to see the translation of the crash address:
> 
> (gdb) list *(free_rq_clone+0x29)
> 0x919 is in free_rq_clone (drivers/md/dm.c:1092).
> 1087            struct dm_rq_target_io *tio = clone->end_io_data;
> 1088            struct mapped_device *md = tio->md;
> 1089
> 1090            blk_rq_unprep_clone(clone);
> 1091
> 1092            if (clone->q->mq_ops)
> 1093                    tio->ti->type->release_clone_rq(clone);
> 1094            else if (!md->queue->mq_ops)
> 1095                    /* request_fn queue stacked on request_fn
> queue(s) */
> 1096                    free_clone_request(md, clone);

I saw a crash like this yesterday with 4.1-rc1 (definitely due to
clone->q being NULL) but I didn't get a full backtrace over serial
console so I cannot be sure it is exactly like yours.

In my case I was using hch's lio-utils based test setup that he
documented here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00138.html

But I got the crash the first time I ran this script:
multipathd -F
tcm_loop --unload
tcm_node --freedev iblock_0/array

Rough first experience with LIO ;)  So I just chalked it up to tcm_loop
or something not being careful about device lifetime.

So we now have 2 data points (each using different storage backend).  I
haven't been able to reproduce the issue again though -- but I switch
away from using multipathd to create the multipath device and resorted
to using dmsetup directly (with a dmsetup remove for cleanup instead of
multipath -F).




More information about the dm-devel mailing list