[dm-devel] kernel BUG at drivers/scsi/device_handler/scsi_dh_alua.c:662!

Brian Bunker brian at purestorage.com
Thu Sep 3 22:06:26 UTC 2020


Hello all,

We have a customer who has hit this line. 

It comes from here (scsi_dh_alua.c ALUA_DH_VER “2.0"):

rcu_read_lock();
 list_for_each_entry_rcu(h,
	&tmp_pg->dh_list, node) {
	/* h->sdev should always be valid */
	BUG_ON(!h->sdev);
	h->sdev->access_state = desc[0];
}
rcu_read_unlock();

It seems like this code is sending an RTPG request down a path and then is trying to get the access state for the
other paths to this same multipath device by looking at the SCSI devices down those paths. In our case we have 
a reboot of a couple of our controllers which leads to the ALUA device handler detaching from the now unreachable
paths. 

I can see that when this code is working that this code goes through the remaining paths. For example, if I have 4
paths in the beginning I can see that this loop is hit 4 times. When I reboot a controller and drop to 2 paths, for
example, I see this code going through the loop 2 times.

Has this been seen before? Is there a potential race between the ALUA handler detach and trying access the
SCSI devices down the paths which have just been detached?

The time between the last detach logged and the crash is about 1/10th of second:
[ 1953.451203] scsi 2:0:3:184: alua: Detached
[ 1953.451217] sd 2:0:1:184: alua: port group 02 state A non-preferred supports tolUSNA
[ 1953.464445] sd 2:0:2:184: alua: port group 03 state A non-preferred supports tolUSNA
[ 1953.473443] sd 2:0:2:184: alua: port group 03 state A non-preferred supports tolUSNA
[ 1953.482134] sd 2:0:1:184: alua: port group 02 state A non-preferred supports tolUSNA
[ 1953.490979] sd 2:0:2:184: alua: port group 03 state A non-preferred supports tolUSNA
[ 1953.499833] sd 2:0:1:184: alua: port group 02 state A non-preferred supports tolUSNA
[ 1953.518793] device-mapper: multipath: Failing path 8:176.
[ 1953.524859] device-mapper: multipath: Failing path 68:496.
[ 1953.531006] device-mapper: multipath: Failing path 69:784.
[ 1953.537152] device-mapper: multipath: Failing path 133:832.
[ 1953.543390] device-mapper: multipath: Failing path 8:1104.
[ 1953.549544] device-mapper: multipath: Failing path 71:1136.
[ 1953.555835] ------------[ cut here ]------------
[ 1953.560995] kernel BUG at drivers/scsi/device_handler/scsi_dh_alua.c:662!

Full stacktrace here:
[ 1953.555835] ------------[ cut here ]------------
[ 1953.560995] kernel BUG at drivers/scsi/device_handler/scsi_dh_alua.c:662!
[ 1953.568581] invalid opcode: 0000 [#1] SMP NOPTI
[ 1953.573642] CPU: 82 PID: 9101 Comm: kworker/82:1 Kdump: loaded Not tainted 5.4.17-2011.5.3.el7uek.x86_64 #2
[ 1953.584517] Hardware name: HPE Superdome Flex/Superdome Flex, BIOS Bundle:3.25.46 SFW:IP147.007.002.132.000.2004232125 04/23/2020
[ 1953.597528] Workqueue: kaluad alua_rtpg_work
[ 1953.602294] RIP: 0010:alua_rtpg+0x7d3/0x7dc
[ 1953.606964] Code: 45 84 c9 0f 84 ae fb ff ff 45 88 8c 24 54 01 00 00 e9 b0 fb ff ff 49 c7 84 24 60 01 00 00 02 00 00 00 41 b2 0a e9 6f ff ff ff <0f> 0b e8 16 5d a5 ff 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5
[ 1953.627922] RSP: 0018:ffff9ee8e4a7fcf0 EFLAGS: 00010046
[ 1953.633756] RAX: ffff8a5cba871c28 RBX: 0000000000000050 RCX: 0000000000000000
[ 1953.641722] RDX: ffff8a634251bb40 RSI: 0000000000000002 RDI: ffff8a5cba871e20
[ 1953.649681] RBP: ffff9ee8e4a7fda8 R08: 0000000000000004 R09: 000000000000003c
[ 1953.657645] R10: 0000000000000000 R11: ffff8ac08dff5100 R12: ffff8a5bc96d5c00
[ 1953.665610] R13: ffff8a5cba871c00 R14: ffff8abb8c3b1a50 R15: ffff8a5cba871e20
[ 1953.673581] FS:  0000000000000000(0000) GS:ffff8acfbeb80000(0000) knlGS:0000000000000000
[ 1953.682612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1953.689027] CR2: 00007ffff7ff4000 CR3: 000000af4400a004 CR4: 00000000007606e0
[ 1953.696992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1953.704959] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1953.712925] PKRU: 55555554
[ 1953.715938] Call Trace:
[ 1953.718676]  ? account_entity_dequeue+0x81/0xad
[ 1953.723734]  alua_rtpg_work+0x257/0x52c
[ 1953.728012]  ? tracing_is_on+0x15/0x28
[ 1953.732203]  ? __switch_to_asm+0x36/0x61
[ 1953.736582]  process_one_work+0x179/0x389
[ 1953.741050]  worker_thread+0x4f/0x3df
[ 1953.745140]  kthread+0x105/0x138
[ 1953.748744]  ? max_active_store+0x80/0x7c
[ 1953.753221]  ? kthread_bind+0x20/0x15
[ 1953.757309]  ret_from_fork+0x24/0x36
[ 1953.761299] Modules linked in: rds_tcp rds fuse btrfs xor zstd_decompress zstd_compress raid6_pq msdos ext4 jbd2 ext2 mbcache nfsv3 nfs_acl nfs lockd grace fscache bonding ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad intel_rapl_msr rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi iTCO_wdt iTCO_vendor_support intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper pcspkr ixgbe mdio dca vfat fat bnxt_re ib_uverbs i40e ib_core ipmi_ssif i2c_i801 lpc_ich joydev sg wmi ipmi_si ipmi_devintf ipmi_msghandler binfmt_misc ip_tables xfs libcrc32c dm_queue_length sr_mod cdrom sd_mod mgag200 uas drm_kms_helper syscopyarea usb_storage sysfillrect dm_multipath sysimgblt fb_sys_fops drm_vram_helper qla2xxx ttm ahci drm nvme_fc bnxt_en libahci megaraid_sas nvme_fabrics libata nvme_core scsi_transport_fc
[ 1953.761355]  i2c_algo_bit dm_mirror dm_region_hash dm_log dm_mod

Thanks,
Brian

Brian Bunker
SW Eng
brian at purestorage.com







More information about the dm-devel mailing list