[dm-devel] [v4.13-rc BUG] system lockup when running big buffered write(4M) to IB SRP via mpath

Ming Lei ming.lei at redhat.com
Wed Aug 23 12:46:56 UTC 2017


On Wed, Aug 23, 2017 at 07:35:26PM +0800, Ming Lei wrote:
> On Wed, Aug 09, 2017 at 05:10:01PM +0000, Bart Van Assche wrote:
> > On Wed, 2017-08-09 at 12:43 -0400, Laurence Oberman wrote:
> > > Your latest patch on stock upstream without Ming's latest patches is 
> > > behaving for me.
> > > 
> > > As already mentioned, the requeue -11 and clone failure messages are 
> > > gone and I am not actually seeing any soft lockups or hard lockups.
> > > 
> > > When Ming gets back I will work with him on his patch set and the lockups.
> > > 
> > > Running 10 parallel writes which easily trips into soft lockups on 
> > > Ming's kernel (even with your patch) has been stable here on 4.13-RC3 
> > > with your patch.
> > > 
> > > I will leave it running for a while now but the patch is good.
> > > 
> > > If it survives 4 hours I will add a Tested-by to your latest patch.
> > 
> > Hello Laurence,
> > 
> > I'm working on an additional patch that should reduce unnecessary requeuing
> > even further. I will let you know when it's ready.
> > 
> > Additionally, please trim e-mails when replying such that e-mails do not get
> > too long.
> 
> soft lockup still can be observed easily with patch d4acf3650c7c(
> block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time),
> but no hard lockup.
> 
> With the patchset of 'blk-mq-sched: improve SCSI-MQ performance', hard
> lockup can be observed following some failure log:
> 
> 	[  269.277653] device-mapper: multipath: blk_get_request() returned -11 - requeuing
> 	[  269.321244] device-mapper: multipath: blk_get_request() returned -11 - requeuing
> 	...
> 	[  273.421688] scsi host2: SRP abort called
> 	[  273.444577] scsi host2: Sending SRP abort for tag 0x6007e
> 	[  273.673871] scsi host2: Null scmnd for RSP w/tag 0x0000000006007e received on ch 6 / QP 0x30
> 	...
> 	[  274.372110] device-mapper: multipath: blk_get_request() returned -11 - requeuing
> 	[  278.658671] scsi host2: SRP abort called
> 	[  278.690630] scsi host2: SRP abort called
> 	[  278.717634] scsi host2: SRP abort called
> 	[  278.745629] scsi host2: SRP abort called
> 	[  279.083227] multipath_clone_and_map: 1092 callbacks suppressed
> 	....
> 	[  296.210503] scsi host2: SRP reset_device called
> 	....
> 	[  303.784287] NMI watchdog: Watchdog detected hard LOCKUP on cpu 10
> 
> The trick thing is that both hard lockup and soft lockup share
> one same stack trace.

Actually I can reproduce hard lockup too on the latest linus tree(v4.13-rc6+)
by just making the test more aggressive:

1) run hammer_write.sh 32-concurrently. 
2) write 8M each time

So it isn't the patchset of 'blk-mq-sched: improve SCSI-MQ performance'
which causes the lockup.

There must be issue somewhere else, and one thing I saw is that
dm's req's completion handling isn't very efficient:

- when low level driver's req is completed, blk_update_request()
is called from scsi_end_request()

- inside blk_update_request(), end_clone_bio() is called for
each bio

- in end_clone_bio(), blk_update_request() is called to
do partial update for the dm req. That means if one request
has N bios, the same dm req need to do the expensive partial
update N times. In theory, the partial update might be avoided,
I guess the reason is that the dm rq can't be retrieved
via the low level driver's rq.


[1] hard lockup log

[  434.068240] NMI watchdog: Watchdog detected hard LOCKUP on cpu 2
[  434.068243] Modules linked in: dm_round_robin xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack rpcrdma iptable_mangle ib_isert iscsi_target_mod iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ib_iser libiscsi ip6_tables scsi_transport_iscsi iptable_filter target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
[  434.068279]  pcbc aesni_intel iTCO_wdt crypto_simd gpio_ich ipmi_si iTCO_vendor_support cryptd joydev glue_helper ipmi_devintf hpwdt acpi_power_meter hpilo sg ipmi_msghandler lpc_ich i7core_edac pcspkr shpchp pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c sd_mod amdkfd amd_iommu_v2 radeon mlx5_core i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm mlxfw drm ptp hpsa pps_core crc32c_intel serio_raw bnx2 i2c_core devlink scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ib_srpt]
[  434.068314] CPU: 2 PID: 1039 Comm: dd Tainted: G          I     4.13.0-rc6.linus+ #28
[  434.068314] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[  434.068316] task: ffff9458f4498040 task.stack: ffffbdb48fd10000
[  434.068323] RIP: 0010:__blk_recalc_rq_segments+0xd9/0x3d0
[  434.068324] RSP: 0018:ffff945b37a43d08 EFLAGS: 00000086
[  434.068325] RAX: ffff9458978a16c0 RBX: 0000000000001000 RCX: ffff9458978a16c0
[  434.068326] RDX: 0000000000000000 RSI: ffff9458978a1748 RDI: ffff9458978a69c8
[  434.068327] RBP: ffff945b37a43d88 R08: 0000000000014000 R09: 0000000000000001
[  434.068328] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  434.068329] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000001d
[  434.068330] FS:  00007f27b13ea740(0000) GS:ffff945b37a40000(0000) knlGS:0000000000000000
[  434.068331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  434.068332] CR2: 00007f27aa600000 CR3: 0000000b1e2af000 CR4: 00000000000006e0
[  434.068333] Call Trace:
[  434.068335]  <IRQ>
[  434.068340]  ? mempool_free+0x2b/0x80
[  434.068343]  blk_recalc_rq_segments+0x28/0x40
[  434.068345]  blk_update_request+0x249/0x310
[  434.068363]  end_clone_bio+0x46/0x70 [dm_mod]
[  434.068367]  bio_endio+0xa1/0x120
[  434.068370]  blk_update_request+0xb7/0x310
[  434.068374]  scsi_end_request+0x38/0x1d0
[  434.068375]  scsi_io_completion+0x13c/0x630
[  434.068378]  scsi_finish_command+0xd9/0x120
[  434.068381]  scsi_softirq_done+0x12a/0x150
[  434.068383]  __blk_mq_complete_request_remote+0x13/0x20
[  434.068387]  flush_smp_call_function_queue+0x52/0x110
[  434.068388]  generic_smp_call_function_single_interrupt+0x13/0x30
[  434.068391]  smp_call_function_single_interrupt+0x27/0x40
[  434.068393]  call_function_single_interrupt+0x93/0xa0
[  434.068396] RIP: 0010:copy_user_generic_string+0x2c/0x40
[  434.068397] RSP: 0018:ffffbdb48fd13c78 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff04
[  434.068398] RAX: 00007f27aa1f3000 RBX: 0000000000001000 RCX: 0000000000000040
[  434.068399] RDX: 0000000000000000 RSI: 00007f27aa1f2e00 RDI: ffff945a2f7b9e00
[  434.068400] RBP: ffffbdb48fd13c80 R08: ffff945a2fcf20b0 R09: 0000000000001000
[  434.068400] R10: 00000000fffff000 R11: 0000000000000000 R12: 0000000000001000
[  434.068401] R13: ffff945a2f7ba000 R14: ffffbdb48fd13e20 R15: ffff945be4817950
[  434.068402]  </IRQ>
[  434.068405]  ? I_BDEV+0x20/0x20
[  434.068408]  ? copyin+0x26/0x30
[  434.068410]  iov_iter_copy_from_user_atomic+0xbd/0x2d0
[  434.068412]  generic_perform_write+0xdd/0x1b0
[  434.068415]  __generic_file_write_iter+0x19b/0x1e0
[  434.068416]  blkdev_write_iter+0x8a/0x100
[  434.068420]  __vfs_write+0xf3/0x170
[  434.068423]  vfs_write+0xb2/0x1b0
[  434.068427]  ? syscall_trace_enter+0x1d0/0x2b0
[  434.068429]  SyS_write+0x55/0xc0
[  434.068431]  do_syscall_64+0x67/0x150
[  434.068434]  entry_SYSCALL64_slow_path+0x25/0x25
[  434.068435] RIP: 0033:0x7f27b0f0dc90
[  434.068436] RSP: 002b:00007ffe24d1d178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  434.068438] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f27b0f0dc90
[  434.068438] RDX: 0000000000800000 RSI: 00007f27aa0fa000 RDI: 0000000000000001
[  434.068439] RBP: 0000000000800000 R08: ffffffffffffffff R09: 0000000000802003
[  434.068440] R10: 00007ffe24d1cf00 R11: 0000000000000246 R12: 0000000000800000
[  434.068441] R13: 00007f27aa0fa000 R14: 00007f27aa8fa000 R15: 0000000000000000
[  434.068442] Code: 06 00 00 88 44 24 17 8b 59 28 44 8b 71 2c 44 8b 61 34 85 db 0f 84 92 01 00 00 45 89 f5 45 89 e2 4c 89 ee 48 c1 e6 04 48 03 71 78 <8b> 46 08 48 8b 16 44 29 e0 48 89 54 24 30 39 d8 0f 47 c3 44 03 
[  434.068461] Kernel panic - not syncing: Hard LOCKUP
[  434.068463] CPU: 2 PID: 1039 Comm: dd Tainted: G          I     4.13.0-rc6.linus+ #28
[  434.068463] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[  434.068463] Call Trace:
[  434.068464]  <NMI>
[  434.068467]  dump_stack+0x63/0x87
[  434.068470]  panic+0xeb/0x245
[  434.068472]  nmi_panic+0x3f/0x40
[  434.068477]  watchdog_overflow_callback+0x106/0x130
[  434.068480]  __perf_event_overflow+0x54/0xe0
[  434.068482]  perf_event_overflow+0x14/0x20
[  434.068486]  intel_pmu_handle_irq+0x203/0x4b0
[  434.068490]  perf_event_nmi_handler+0x2d/0x50
[  434.068492]  nmi_handle+0x61/0x110
[  434.068494]  default_do_nmi+0xf6/0x120
[  434.068495]  do_nmi+0x113/0x190
[  434.068497]  end_repeat_nmi+0x1a/0x1e
[  434.068499] RIP: 0010:__blk_recalc_rq_segments+0xd9/0x3d0
[  434.068499] RSP: 0018:ffff945b37a43d08 EFLAGS: 00000086
[  434.068500] RAX: ffff9458978a16c0 RBX: 0000000000001000 RCX: ffff9458978a16c0
[  434.068501] RDX: 0000000000000000 RSI: ffff9458978a1748 RDI: ffff9458978a69c8
[  434.068502] RBP: ffff945b37a43d88 R08: 0000000000014000 R09: 0000000000000001
[  434.068502] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  434.068503] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000001d
[  434.068505]  ? __blk_recalc_rq_segments+0xd9/0x3d0
[  434.068508]  ? __blk_recalc_rq_segments+0xd9/0x3d0
[  434.068510]  </NMI>
[  434.068510]  <IRQ>
[  434.068512]  ? mempool_free+0x2b/0x80
[  434.068514]  blk_recalc_rq_segments+0x28/0x40
[  434.068516]  blk_update_request+0x249/0x310
[  434.068521]  end_clone_bio+0x46/0x70 [dm_mod]
[  434.068522]  bio_endio+0xa1/0x120
[  434.068523]  blk_update_request+0xb7/0x310
[  434.068525]  scsi_end_request+0x38/0x1d0
[  434.068527]  scsi_io_completion+0x13c/0x630
[  434.068528]  scsi_finish_command+0xd9/0x120
[  434.068531]  scsi_softirq_done+0x12a/0x150
[  434.068534]  __blk_mq_complete_request_remote+0x13/0x20
[  434.068536]  flush_smp_call_function_queue+0x52/0x110
[  434.068537]  generic_smp_call_function_single_interrupt+0x13/0x30
[  434.068539]  smp_call_function_single_interrupt+0x27/0x40
[  434.068542]  call_function_single_interrupt+0x93/0xa0
[  434.068543] RIP: 0010:copy_user_generic_string+0x2c/0x40
[  434.068544] RSP: 0018:ffffbdb48fd13c78 EFLAGS: 00010246 ORIG_RAX: ffffffffffffff04
[  434.068545] RAX: 00007f27aa1f3000 RBX: 0000000000001000 RCX: 0000000000000040
[  434.068546] RDX: 0000000000000000 RSI: 00007f27aa1f2e00 RDI: ffff945a2f7b9e00
[  434.068546] RBP: ffffbdb48fd13c80 R08: ffff945a2fcf20b0 R09: 0000000000001000
[  434.068547] R10: 00000000fffff000 R11: 0000000000000000 R12: 0000000000001000
[  434.068548] R13: ffff945a2f7ba000 R14: ffffbdb48fd13e20 R15: ffff945be4817950
[  434.068548]  </IRQ>
[  434.068550]  ? I_BDEV+0x20/0x20
[  434.068551]  ? copyin+0x26/0x30
[  434.068553]  iov_iter_copy_from_user_atomic+0xbd/0x2d0
[  434.068554]  generic_perform_write+0xdd/0x1b0
[  434.068557]  __generic_file_write_iter+0x19b/0x1e0
[  434.068560]  blkdev_write_iter+0x8a/0x100
[  434.068562]  __vfs_write+0xf3/0x170
[  434.068565]  vfs_write+0xb2/0x1b0
[  434.068566]  ? syscall_trace_enter+0x1d0/0x2b0
[  434.068568]  SyS_write+0x55/0xc0
[  434.068569]  do_syscall_64+0x67/0x150
[  434.068572]  entry_SYSCALL64_slow_path+0x25/0x25
[  434.068573] RIP: 0033:0x7f27b0f0dc90
[  434.068574] RSP: 002b:00007ffe24d1d178 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  434.068575] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f27b0f0dc90
[  434.068576] RDX: 0000000000800000 RSI: 00007f27aa0fa000 RDI: 0000000000000001
[  434.068577] RBP: 0000000000800000 R08: ffffffffffffffff R09: 0000000000802003
[  434.068578] R10: 00007ffe24d1cf00 R11: 0000000000000246 R12: 0000000000800000
[  434.068579] R13: 00007f27aa0fa000 R14: 00007f27aa8fa000 R15: 0000000000000000
[  434.100416] Kernel Offset: 0x13800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  438.636598] ---[ end Kernel panic - not syncing: Hard LOCKUP


-- 
Ming




More information about the dm-devel mailing list