[dm-devel] 4.5-rc1 multipath regression

Bart Van Assche bart.vanassche at sandisk.com
Mon Feb 8 18:16:52 UTC 2016


On 01/29/2016 04:07 PM, Mike Snitzer wrote:
> On Fri, Jan 29 2016 at  1:42pm -0500,
> Bart Van Assche <bart.vanassche at sandisk.com> wrote:
>> On 01/28/2016 03:39 PM, Bart Van Assche wrote:
>>> There is a regression in the 4.5-rc1 kernel with regard to multipath
>>> setup. On my SRP I usually use for these tests after a few minutes a
>>> kernel crash occurs and the console freezes. A screenshot has been attached.
>>
>> (replying to my own e-mail)
> 
> Not sure where you sent your first email.. not seeing it on dm-devel
> archives.
> 
> So I don't have the original screenshot you attached.
> 
> The 4.5 merge window didn't see any changes to DM mpath or DM core.  So
> any regression is very likely outside DM and rooted in SRP or whatever
> other dependencies your setup relies on.

Hello Mike,

The behavior I see with kernel v4.5-rc3 is different of what I saw with
v4.5-rc1 but it still is not the behavior I expect. The call trace that
was triggered this morning on my test setup can be found below. I assume
the information below means that the tio->ti->type is NULL in dm_done() ?

Bart.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
IP: [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod]
PGD 456993067 PUD 40c76a067 PMD 0 
Oops: 0000 [#1] SMP 
Modules linked in: scsi_dh_alua dm_queue_length netconsole autofs4 ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm configfs ib_cm iw_cm dm_round_robin dm_multipath iTCO_wdt iTCO_vendor_support ipmi_devintf dcdbas ipmi_si ipmi_msghandler sb_edac edac_core lpc_ich mfd_core tg3 libphy ptp pps_core sg wmi ext4(E) jbd2(E) mbcache(E) sr_mod(E) cdrom(E) sd_mod(E) ahci(E) libahci(E) mlx4_ib(E) ib_sa(E) ib_mad(E) ib_core(E) ib_addr(E) ipv6(E) mlx4_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
CPU: 0 PID: 618 Comm: kworker/0:1H Tainted: G            E   4.5.0-rc3+ #3
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: kblockd blk_mq_run_work_fn
task: ffff880437fa5e80 ti: ffff880437a6c000 task.ti: ffff880437a6c000
RIP: 0010:[<ffffffffa00020e5>]  [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod]
RSP: 0018:ffff88046e403e38  EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff8803f6a98d70 RCX: dead000000000200
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffc9000933c040
sd 23:0:0:1: Asymmetric access state changed
device-mapper: multipath: Failing path 67:176.
device-mapper: multipath: Failing path 68:16.
sd 24:0:0:1: Asymmetric access state changed
RBP: ffff88046e403e78 R08: ffff8803f6a98c78 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88006c0f2680
R13: ffff8803f6a98c00 R14: ffff88046e403ec8 R15: 0000000000000005
FS:  0000000000000000(0000) GS:ffff88046e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000060 CR3: 000000041defd000 CR4: 00000000001406f0
Stack:
 0000000000000003 0000000000000002 ffff88046e403e78 ffff8803f6a98d70
 ffff8803f6a98c00 ffff8803f6a98c00 ffff88046e403ec8 0000000000000005
 ffff88046e403ea8 ffffffffa00022ac ffffffff81a090e0 ffff8803f6a98c78
Call Trace:
 <IRQ> 
 [<ffffffffa00022ac>] dm_softirq_done+0x4c/0xd0 [dm_mod]
 [<ffffffff812476ac>] blk_done_softirq+0x8c/0xb0
 [<ffffffff8105be66>] __do_softirq+0xf6/0x240
 [<ffffffff8105c0bc>] irq_exit+0xac/0xc0
 [<ffffffff8103afde>] smp_call_function_single_interrupt+0x2e/0x40
 [<ffffffff81535779>] call_function_single_interrupt+0x89/0x90
 <EOI> 
 [<ffffffff8153422d>] ? _raw_spin_unlock_irqrestore+0x3d/0x60
 [<ffffffffa03515bc>] multipath_busy+0xcc/0xf0 [dm_multipath]
 [<ffffffffa00045bd>] dm_mq_queue_rq+0x7d/0x180 [dm_mod]
 [<ffffffff81249cdb>] __blk_mq_run_hw_queue+0x29b/0x490
 [<ffffffff810a5fd3>] ? __lock_acquire+0x3b3/0x560
 [<ffffffff81249f10>] blk_mq_run_work_fn+0x10/0x20
 [<ffffffff810723ea>] process_one_work+0x1da/0x480
 [<ffffffff8107237a>] ? process_one_work+0x16a/0x480
 [<ffffffff810a62c4>] ? __lock_release+0xc4/0x3a0
 [<ffffffff81072f39>] worker_thread+0x169/0x520
 [<ffffffff81099d58>] ? complete+0x48/0x60
 [<ffffffff8153422b>] ? _raw_spin_unlock_irqrestore+0x3b/0x60
 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110
 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110
 [<ffffffff8152ee92>] ? schedule+0x42/0xb0
 [<ffffffff81072dd0>] ? maybe_create_worker+0x110/0x110
 [<ffffffff81078f94>] kthread+0xe4/0x100
 [<ffffffff810a4dcd>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff81081c99>] ? schedule_tail+0x19/0xd0
 [<ffffffff81078eb0>] ? __init_kthread_worker+0x70/0x70
 [<ffffffff8153497f>] ret_from_fork+0x3f/0x70
 [<ffffffff81078eb0>] ? __init_kthread_worker+0x70/0x70
Code: 65 e0 48 89 5d d8 49 89 fc 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 48 8b 9f 60 01 00 00 48 8b 7b 08 48 85 ff 74 0c 48 8b 47 08 84 d2 <4c> 8b 40 60 75 44 41 89 f5 41 83 fd 87 0f 84 f2 00 00 00 45 85 
RIP  [<ffffffffa00020e5>] dm_done+0x35/0x1b0 [dm_mod]
 RSP <ffff88046e403e38>
CR2: 0000000000000060
---[ end trace f47c39416952f73a ]---
sd 31:0:0:1: Asymmetric access state changed
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception in interrupt


$ gdb drivers/md/dm-mod.o
(gdb) list *(dm_done+0x35)
0x20e5 is in dm_done (drivers/md/dm.c:1273).
1268            int r = error;
1269            struct dm_rq_target_io *tio = clone->end_io_data;
1270            dm_request_endio_fn rq_end_io = NULL;
1271
1272            if (tio->ti) {
1273                    rq_end_io = tio->ti->type->rq_end_io;
1274
1275                    if (mapped && rq_end_io)
1276                            r = rq_end_io(tio->ti, clone, error, &tio->info);
1277            }




More information about the dm-devel mailing list