[dm-devel] Regression due to commit dbaf971c9cdf10843071a60dcafc1aaab3162354 ?

Jean-François Remy jeff at drivescale.com
Mon Mar 16 17:35:51 UTC 2020


Hi,
we’re using multipath with queue mode bio and we’ve run into what seem to be a regression introduce by commit dbaf971c9cdf10843071a60dcafc1aaab3162354 in 5.5 (which was also back ported to 5.4).
This happens at the time the multipath device is created.
We’re running on a Cisco box with an mpt3sas hba controller, SAS drives, the kernel is a vanilla kernel from kernel.org <http://kernel.org/> with a few patches in completely unrelated part of the kernel code, multipath 0.8.3 on a Debian Buster.

We’ve initially bisected the issue on the v5.4.x branch down to commit 7e53ea4a1641c463d5369f800734920f1dac56c2 and then we also verified that a v5.5.9 build without commit dbaf971c9cdf10843071a60dcafc1aaab3162354 did not exhibit the bug while it does with it.

When booting our test platform with this commit included, we see the a lot fo kernel WARNING traces like the following one and the multipath devices are unusable:

[   34.559589] ------------[ cut here ]------------
[   34.559600] WARNING: CPU: 3 PID: 1432 at kernel/workqueue.c:1622 __queue_delayed_work+0x70/0x90
[   34.559600] Modules linked in: dm_service_time nvmet_tcp mlx5_ib mlx5_core ib_uverbs pci_hyperv_intf nvmet_rdma rdma_cm iw_cm ib_cm ib_core nvmet nvme_fabrics iscsi_target_mod target_core_iblock target_core_mod configfs mpt3sas raid_class scsi_transport_sas dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ipmi_ssif crc32_pclmul ghash_clmulni_intel snd_pcm snd_timer snd soundcore aesni_intel mei_me iTCO_wdt crypto_simd cryptd input_leds joydev glue_helper mei iTCO_vendor_support pcspkr ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter mac_hid ip_tables x_tables autofs4 usb_storage hid_generic usbkbd usbmouse usbhid hid fnic libfcoe ahci mxm_wmi libfc libahci lpc_ich enic scsi_transport_fc wmi
[   34.559634] CPU: 3 PID: 1432 Comm: systemd-udevd Not tainted 5.5.8 #98
[   34.559634] Hardware name: Cisco Systems Inc UCSC-C3K-M4SRB/UCSC-C3K-M4SRB, BIOS C3X60M4.4.0.2f.0.1113190831 11/13/2019
[   34.559637] RIP: 0010:__queue_delayed_work+0x70/0x90
[   34.559638] Code: 41 81 f8 00 02 00 00 48 89 4a 30 75 2a e9 c8 cd 06 00 44 89 c7 e9 80 fb ff ff 0f 0b eb cb 0f 0b 48 81 7a 38 40 a0 0a b2 74 ab <0f> 0b 48 83 7a 28 00 74 a9 0f 0b eb a5 44 89 c6 e9 ab bc 06 00 90
[   34.559639] RSP: 0018:ffffb6e88e9b3830 EFLAGS: 00010007
[   34.559640] RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000000
[   34.559641] RDX: ffff9e9c38006c30 RSI: ffff9e9c33933c00 RDI: ffff9e9c38006c50
[   34.559642] RBP: ffff9e9c33828e00 R08: 0000000000000200 R09: ffff9e7c326cc458
[   34.559643] R10: 0000000000000000 R11: 01fffffffffffffe R12: 0000000000000000
[   34.559643] R13: ffff9e7c050400b0 R14: ffff9e7c05040000 R15: 0000000000000001
[   34.559644] FS:  00007ff73a5cbd40(0000) GS:ffff9e7c3f6c0000(0000) knlGS:0000000000000000
[   34.559645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   34.559645] CR2: 00007ffdf1e8ea48 CR3: 0000001ff447a001 CR4: 00000000003606e0
[   34.559646] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   34.559646] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   34.559647] Call Trace:
[   34.559651]  queue_delayed_work_on+0x24/0x40
[   34.559656]  __pg_init_all_paths+0x75/0xc0 [dm_multipath]
[   34.559658]  pg_init_all_paths+0x23/0x40 [dm_multipath]
[   34.559660]  __multipath_map_bio+0x1b5/0x230 [dm_multipath]
[   34.559664]  __map_bio+0x42/0x170
[   34.559666]  __split_and_process_non_flush+0x132/0x1d0
[   34.559669]  __split_and_process_bio+0x94/0x240
[   34.559672]  ? blk_throtl_bio+0x141/0xbf0
[   34.559674]  dm_process_bio+0x117/0x230
[   34.559678]  ? generic_make_request_checks+0x23a/0x5c0
[   34.559680]  dm_make_request+0x3b/0xb0
[   34.559681]  generic_make_request+0x11f/0x2e0
[   34.559683]  ? submit_bio+0x72/0x140
[   34.559685]  submit_bio+0x72/0x140
[   34.559689]  mpage_readpages+0x154/0x190
[   34.559692]  ? bdev_evict_inode+0xf0/0xf0
[   34.559697]  read_pages+0x71/0x1a0
[   34.559700]  ? __do_page_cache_readahead+0x199/0x1b0
[   34.559701]  __do_page_cache_readahead+0x199/0x1b0
[   34.559703]  force_page_cache_readahead+0xb7/0xe0
[   34.559705]  generic_file_read_iter+0x7f3/0xbf0
[   34.559708]  ? _copy_to_user+0x22/0x30
[   34.559713]  ? cp_new_stat+0x154/0x190
[   34.559716]  new_sync_read+0x11b/0x1b0
[   34.559718]  vfs_read+0x90/0x130
[   34.559720]  ksys_read+0x5c/0xe0
[   34.559725]  do_syscall_64+0x52/0x1a0
[   34.559730]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   34.559731] RIP: 0033:0x7ff73adac461
[   34.559733] Code: fe ff ff 50 48 8d 3d fe d0 09 00 e8 e9 03 02 00 66 0f 1f 84 00 00 00 00 00 48 8d 05 99 62 0d 00 8b 00 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
[   34.559734] RSP: 002b:00007ffdf1e90ba8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[   34.559735] RAX: ffffffffffffffda RBX: 0000557789799f50 RCX: 00007ff73adac461
[   34.559735] RDX: 0000000000000040 RSI: 000055778978b588 RDI: 0000000000000006
[   34.559736] RBP: 0000557789799fa0 R08: 000055778978b560 R09: 00007ff73ae7e330
[   34.559737] R10: 000055778977d010 R11: 0000000000000246 R12: 0000057541e80000
[   34.559737] R13: 0000000000000040 R14: 000055778978b578 R15: 000055778978b560
[   34.559738] ---[ end trace 865597b9b72c7dc2 ]—


Let me know if there is anything else that would help understand what goes on

best,
Jean-Francois Remy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20200316/129ba427/attachment.htm>


More information about the dm-devel mailing list