[dm-devel] kernel BUG at drivers/md/dm-bufio.c:1155!

Marc Smith msmith626 at gmail.com
Wed May 4 17:50:45 UTC 2022


On Mon, May 2, 2022 at 2:03 PM Marc Smith <msmith626 at gmail.com> wrote:
>
> Hi,
>
> I'm using Linux 5.4.190 (kernel.org) with 'lvmcache' (LVM2 version
> 2.03.14). I use dm-cache (in write-back mode) and dm-writecache, both
> via lvmcache for several logical volumes. I was recently performing
> power-loss tests, and noticed one of the dm-cache (in write-back mode)
> LV's is encountering a problem when I attempt to activate that volume
> group:
> [root at fusion-dvt3a ~]# vgchange -ay 10072a_1_default
> Segmentation fault
> [root at fusion-dvt3a ~]#
>
> Then in the kernel log I see this:
> ...
> [ 4528.074501] device-mapper: cache: Origin device (dm-30) discard
> unsupported: Disabling discard passdown.
> [ 4528.074716] device-mapper: array: array_block_check failed: blocknr
> 16818504669106333287 != wanted 10
> [ 4528.074717] device-mapper: block manager: array validator check
> failed for block 10
> [ 4528.074717] device-mapper: array: get_ablock failed
> [ 4528.074718] device-mapper: cache metadata: dm_array_cursor_next for
> mapping failed
> [ 4528.074727] ------------[ cut here ]------------
> [ 4528.074728] kernel BUG at drivers/md/dm-bufio.c:1155!
> [ 4528.074732] invalid opcode: 0000 [#1] SMP NOPTI
> [ 4528.074757] CPU: 9 PID: 53811 Comm: vgchange Kdump: loaded Tainted:
> P           OE     5.4.190-esos.prod #1
> [ 4528.074797] Hardware name: Quantum H2012/H12SSW-NT, BIOS
> T20201009143356 10/09/2020
> [ 4528.074833] RIP: 0010:dm_bufio_release+0x19/0x75
> [ 4528.074855] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> 00 49
> [ 4528.074928] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> [ 4528.074952] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> [ 4528.074982] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> [ 4528.075012] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> [ 4528.075042] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> [ 4528.075073] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> [ 4528.075103] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> knlGS:0000000000000000
> [ 4528.075137] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4528.075161] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> [ 4528.075191] Call Trace:
> [ 4528.075207]  dm_array_cursor_end+0x1c/0x27
> [ 4528.075255]  dm_cache_load_mappings+0x2d0/0x322
> [ 4528.075304]  ? clear_discard+0x34/0x34
> [ 4528.075348]  cache_preresume+0x108/0x1e3
> [ 4528.075395]  dm_table_resume_targets+0x38/0xa9
> [ 4528.075443]  __dm_resume+0x33/0x45
> [ 4528.075486]  dm_resume+0x8c/0xb1
> [ 4528.075529]  dev_suspend+0x173/0x1d8
> [ 4528.075573]  ctl_ioctl+0x2a6/0x39d
> [ 4528.075617]  ? retrieve_status+0x19e/0x19e
> [ 4528.075663]  dm_ctl_ioctl+0x5/0x8
> [ 4528.075708]  vfs_ioctl+0x19/0x26
> [ 4528.075751]  do_vfs_ioctl+0x52a/0x554
> [ 4528.075797]  ? ksys_semctl.constprop.0+0xfe/0x15e
> [ 4528.075847]  ? rcu_is_watching+0x5/0x9
> [ 4528.075891]  ksys_ioctl+0x37/0x56
> [ 4528.075935]  __x64_sys_ioctl+0x11/0x14
> [ 4528.075980]  do_syscall_64+0x52/0x60
> [ 4528.076027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 4528.076077] RIP: 0033:0x7f48fac5f617
> [ 4528.076121] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
> 0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
> 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
> 01 48
> [ 4528.076276] RSP: 002b:00007ffe25fa3298 EFLAGS: 00000206 ORIG_RAX:
> 0000000000000010
> [ 4528.076362] RAX: ffffffffffffffda RBX: 0000556c551c3391 RCX: 00007f48fac5f617
> [ 4528.076419] RDX: 0000556c564ff2c0 RSI: 00000000c138fd06 RDI: 0000000000000003
> [ 4528.076475] RBP: 0000556c564f3d30 R08: 0000556c552b7a5b R09: 00007ffe25fa30f8
> [ 4528.076532] R10: 0000556c552beb30 R11: 0000000000000206 R12: 0000556c564ff2f0
> [ 4528.076589] R13: 0000556c564ff2c0 R14: 0000556c564ff370 R15: 0000000000000001
> [ 4528.076646] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
> bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
> nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
> cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
> ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
> ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
> ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
> bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
> [ 4528.077476] ---[ end trace 97c700627c684afd ]---
> [ 4528.083087] RIP: 0010:dm_bufio_release+0x19/0x75
> [ 4528.083189] Code: fb ff ff 48 89 ef be 01 00 00 00 5d e9 e3 fe ff
> ff 41 54 55 4c 8b 67 78 48 89 fd 4c 89 e7 e8 5b cb 26 00 8b 45 50 85
> c0 75 02 <0f> 0b ff c8 89 45 50 75 48 31 c9 ba 01 00 00 00 be 03 00 00
> 00 49
> [ 4528.083397] RSP: 0018:ffffc90009f73b78 EFLAGS: 00010246
> [ 4528.083499] RAX: 0000000000000000 RBX: ffff88871f1fd340 RCX: 0000000000000007
> [ 4528.083608] RDX: ffff888826ed1600 RSI: ffffc90009f73b58 RDI: ffff88880da7e800
> [ 4528.083717] RBP: ffff8887482a0150 R08: 0000000000000001 R09: 0000000000037100
> [ 4528.083827] R10: 0000000000000000 R11: 0000000000000068 R12: ffff88880da7e800
> [ 4528.083936] R13: ffff88871f1fd478 R14: ffff88871f1fd340 R15: 00000000000001fc
> [ 4528.084051] FS:  00007f48fab9b2c0(0000) GS:ffff88882b040000(0000)
> knlGS:0000000000000000
> [ 4528.084194] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4528.084298] CR2: 00007fae36386288 CR3: 0000000721d20000 CR4: 0000000000340ee0
> ...
>
> The "blocknr" value of "16818504669106333287" caught my eye, as that
> number seems way out of bounds for the origin/cache devices in this
> setup (the cache LV size in this case is 256 MiB, and the origin
> device is less than 500 GiB).
>
> At first I suspected cache device corruption, so I used the PV data
> offset + PE segment sizes and offset to extract an image of the cache
> LV into a separate file so I can examine it with the cache_* user land
> tools, and this is what I see:
> [root at fusion-dvt3a ~]# cache_dump /mnt/data/md122_C02_cvol.img
> <superblock uuid="" block_size="128" nr_cache_blocks="3968"
> policy="mq" hint_width="4">
>   <mappings>
>     <mapping cache_block="0" origin_block="7325368" dirty="false"/>
>     <mapping cache_block="1" origin_block="7325369" dirty="false"/>
>     <mapping cache_block="2" origin_block="7325370" dirty="false"/>
>     <mapping cache_block="3" origin_block="7325371" dirty="false"/>
>     <mapping cache_block="4" origin_block="7325372" dirty="false"/>
>     <mapping cache_block="5" origin_block="7325373" dirty="false"/>
>     <mapping cache_block="6" origin_block="7325374" dirty="false"/>
>     <mapping cache_block="7" origin_block="7325375" dirty="false"/>
>     <mapping cache_block="8" origin_block="0" dirty="false"/>
>     <mapping cache_block="9" origin_block="7325351" dirty="false"/>
>     <mapping cache_block="10" origin_block="1" dirty="false"/>
>     <mapping cache_block="11" origin_block="2" dirty="false"/>
>     <mapping cache_block="12" origin_block="4" dirty="false"/>
>     <mapping cache_block="13" origin_block="8" dirty="false"/>
>     <mapping cache_block="14" origin_block="16" dirty="false"/>
>     <mapping cache_block="15" origin_block="32" dirty="false"/>
>     <mapping cache_block="16" origin_block="64" dirty="false"/>
>     <mapping cache_block="17" origin_block="3" dirty="false"/>
>     <mapping cache_block="18" origin_block="5" dirty="false"/>
>     <mapping cache_block="19" origin_block="6" dirty="false"/>
>     <mapping cache_block="20" origin_block="7" dirty="false"/>
>     <mapping cache_block="21" origin_block="9" dirty="false"/>
>     <mapping cache_block="22" origin_block="10" dirty="false"/>
>     <mapping cache_block="23" origin_block="11" dirty="false"/>
>     <mapping cache_block="24" origin_block="12" dirty="false"/>
>     <mapping cache_block="25" origin_block="13" dirty="false"/>
>     <mapping cache_block="26" origin_block="14" dirty="false"/>
>     <mapping cache_block="27" origin_block="15" dirty="false"/>
>     <mapping cache_block="28" origin_block="7322946" dirty="false"/>
>     <mapping cache_block="29" origin_block="7324340" dirty="false"/>
>     <mapping cache_block="30" origin_block="7324341" dirty="false"/>
>     <mapping cache_block="31" origin_block="7324342" dirty="false"/>
>     <mapping cache_block="32" origin_block="7324343" dirty="false"/>
>     <mapping cache_block="33" origin_block="7324344" dirty="false"/>
>     <mapping cache_block="34" origin_block="7324345" dirty="false"/>
>     <mapping cache_block="35" origin_block="7324346" dirty="false"/>
>     <mapping cache_block="36" origin_block="7324347" dirty="false"/>
>   </mappings>
>   <hints>
>     <hint cache_block="0" data="PwAAAA=="/>
>     <hint cache_block="1" data="PwAAAA=="/>
>     <hint cache_block="2" data="PwAAAA=="/>
>     <hint cache_block="3" data="PwAAAA=="/>
>     <hint cache_block="4" data="PwAAAA=="/>
>     <hint cache_block="5" data="PwAAAA=="/>
>     <hint cache_block="6" data="PwAAAA=="/>
>     <hint cache_block="7" data="PwAAAA=="/>
>     <hint cache_block="8" data="PwAAAA=="/>
>     <hint cache_block="9" data="PwAAAA=="/>
>     <hint cache_block="10" data="PwAAAA=="/>
>     <hint cache_block="11" data="PwAAAA=="/>
>     <hint cache_block="12" data="PwAAAA=="/>
>     <hint cache_block="13" data="PwAAAA=="/>
>     <hint cache_block="14" data="PwAAAA=="/>
>     <hint cache_block="15" data="PwAAAA=="/>
>     <hint cache_block="16" data="PwAAAA=="/>
>     <hint cache_block="17" data="PwAAAA=="/>
>     <hint cache_block="18" data="PwAAAA=="/>
>     <hint cache_block="19" data="PwAAAA=="/>
>     <hint cache_block="20" data="PwAAAA=="/>
>     <hint cache_block="21" data="AAAAAA=="/>
>     <hint cache_block="22" data="AAAAAA=="/>
>     <hint cache_block="23" data="AAAAAA=="/>
>     <hint cache_block="24" data="AAAAAA=="/>
>     <hint cache_block="25" data="AAAAAA=="/>
>     <hint cache_block="26" data="AAAAAA=="/>
>     <hint cache_block="27" data="AAAAAA=="/>
>     <hint cache_block="28" data="AAAAAA=="/>
>     <hint cache_block="29" data="AAAAAA=="/>
>     <hint cache_block="30" data="AAAAAA=="/>
>     <hint cache_block="31" data="AAAAAA=="/>
>     <hint cache_block="32" data="AAAAAA=="/>
>     <hint cache_block="33" data="AAAAAA=="/>
>     <hint cache_block="34" data="AAAAAA=="/>
>     <hint cache_block="35" data="AAAAAA=="/>
>     <hint cache_block="36" data="AAAAAA=="/>
>   </hints>
> </superblock>
> [root at fusion-dvt3a ~]# cache_check /mnt/data/md122_C02_cvol.img
> examining superblock
> examining mapping array
> examining hint array
> examining discard bitset
> [root at fusion-dvt3a ~]# echo $?
> 0
> [root at fusion-dvt3a ~]#
>
> Nothing seems out of the ordinary to me, and 'cache_check' doesn't
> report any errors/problems with the metadata.
>
> So I then used a "debug" Linux Kernel (same 5.4.190 code) that
> contains additional debugging features (full preemption, lock
> debugging, etc.). It also has the
> CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING feature enabled, which I suspect
> might be related to the persistent-data stuff used by dm-cache. And
> when I boot this kernel that has the additional debug features, I do
> NOT hit the BUG_ON() in dm_bufio_release() and everything works
> properly -- the VG/LV's activate just fine and are usable, no kernel
> errors / call traces logged.
>
> I switched back and fourth between these two kernels several times,
> just to make sure it wasn't a fluke. And indeed, I am unable to hit
> this problem with the "debug" kernel.
>
> Based on the experiment between the two kernel configs and the user
> land tools not reporting an error, it sounds like my cache metadata is
> sound, but hitting some other issue that I don't understand yet.
>
> Wondering if anyone has any tips / guidance on additional debugging I
> can do? I plan to circle back and take my original kernel config and
> enable CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING to see if that has any
> effect.

I enabled these two additional kernel config options and re-tested:
CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
CONFIG_DM_DEBUG_BLOCK_STACK_TRACING=y

And then when activating the VG, the 'lvm' (vgchange) tool still seg
faults, but the kernel log and call trace is a bit different with
these options enabled:
...
[ 7134.908886] device-mapper: cache: Origin device (dm-30) discard
unsupported: Disabling discard passdown.
[ 7134.909088] device-mapper: array: array_block_check failed: blocknr
16818504669106333287 != wanted 10
[ 7134.909090] device-mapper: block manager: array validator check
failed for block 10
[ 7134.909090] device-mapper: array: get_ablock failed
[ 7134.909091] device-mapper: cache metadata: dm_array_cursor_next for
mapping failed
[ 7134.909102] ------------[ cut here ]------------
[ 7134.909102] kernel BUG at drivers/md/persistent-data/dm-block-manager.c:246!
[ 7134.909107] invalid opcode: 0000 [#1] SMP NOPTI
[ 7134.909133] CPU: 13 PID: 75361 Comm: vgchange Kdump: loaded
Tainted: P           OE     5.4.190-esos.prod #1
[ 7134.909175] Hardware name: Quantum H2012/H12SSW-NT, BIOS
T20201009143356 10/09/2020
[ 7134.909214] RIP: 0010:bl_up_read+0xf/0x43
[ 7134.909233] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
8b 45
[ 7134.909311] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
[ 7134.909335] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
[ 7134.909366] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
[ 7134.909397] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
[ 7134.909428] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
[ 7134.909459] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
[ 7134.909491] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
knlGS:0000000000000000
[ 7134.909526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7134.909552] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
[ 7134.909583] Call Trace:
[ 7134.909597]  dm_bm_unlock+0x2f/0x3a
[ 7134.909642]  dm_array_cursor_end+0x1c/0x27
[ 7134.909690]  dm_cache_load_mappings+0x2d0/0x322
[ 7134.909739]  ? clear_discard+0x34/0x34
[ 7134.909784]  cache_preresume+0x108/0x1e3
[ 7134.909831]  dm_table_resume_targets+0x38/0xa9
[ 7134.909880]  __dm_resume+0x33/0x45
[ 7134.909923]  dm_resume+0x8c/0xb1
[ 7134.909967]  dev_suspend+0x173/0x1d8
[ 7134.910011]  ctl_ioctl+0x2a6/0x39d
[ 7134.910055]  ? retrieve_status+0x19e/0x19e
[ 7134.910103]  dm_ctl_ioctl+0x5/0x8
[ 7134.910148]  vfs_ioctl+0x19/0x26
[ 7134.910191]  do_vfs_ioctl+0x52a/0x554
[ 7134.910237]  ? ksys_semctl.constprop.0+0xfe/0x15e
[ 7134.910287]  ? rcu_is_watching+0x5/0x9
[ 7134.910333]  ksys_ioctl+0x37/0x56
[ 7134.910376]  __x64_sys_ioctl+0x11/0x14
[ 7134.910423]  do_syscall_64+0x52/0x60
[ 7134.910469]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7134.910520] RIP: 0033:0x7fdfa3a7b617
[ 7134.910564] Code: c0 78 ed 4c 89 e0 5a 5b 41 5c c3 48 8b 05 71 88
0b 00 64 c7 00 26 00 00 00 48 83 c8 ff c3 0f 1f 44 00 00 b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 88 0b 00 f7 d8 64 89
01 48
[ 7134.910722] RSP: 002b:00007fffcfccb488 EFLAGS: 00000206 ORIG_RAX:
0000000000000010
[ 7134.910810] RAX: ffffffffffffffda RBX: 0000559be5dae391 RCX: 00007fdfa3a7b617
[ 7134.910868] RDX: 0000559be76589c0 RSI: 00000000c138fd06 RDI: 0000000000000003
[ 7134.910925] RBP: 0000559be764c830 R08: 0000559be5ea2a5b R09: 00007fffcfccb2e8
[ 7134.910983] R10: 0000559be5ea9b30 R11: 0000000000000206 R12: 0000559be76589f0
[ 7134.911041] R13: 0000559be76589c0 R14: 0000559be7658a70 R15: 0000000000000001
[ 7134.911100] Modules linked in: dm_cache_smq pci_stub nvmet_rdma(O)
bcache celerity16fc_cfg(O) nvmet_fc(O) nvmet(O) qla2xxx(O) nvme_fc(O)
nvme_fabrics(O) bonding ntb_transport ntb_hw_switchtec(O)
cls_switchtec(O) qede qed bna rdmavt(O) ib_umad(O) rdma_ucm(O)
ib_srp(O) rdma_cm(O) iw_cm(O) ib_cm(O) iw_cxgb4(O) iw_cxgb3(O)
ib_qib(O) mlx5_ib(O) mlx5_core(O) mlxdevm(O) pci_hyperv_intf mlxfw(O)
ib_uverbs(O) ib_core(O) auxiliary(O) mlx4_ib(O) ib_mthca(O)
bnxt_en(OE) nvme(O) nvme_core(O) mlx_compat(O) [last unloaded: scst]
[ 7134.911937] ---[ end trace bf6823c5ee51ae83 ]---
[ 7134.917732] RIP: 0010:bl_up_read+0xf/0x43
[ 7134.917831] Code: 48 83 f8 04 75 f0 e8 69 fc ff ff 89 c0 48 c7 44
c2 18 00 00 00 00 e9 e1 fb ff ff 55 48 89 fd e8 ab ff 25 00 83 7d 04
00 7f 02 <0f> 0b 48 89 ef 65 48 8b 34 25 40 ad 01 00 e8 b1 ff ff ff 48
8b 45
[ 7134.918042] RSP: 0018:ffffc90009f1bb68 EFLAGS: 00010246
[ 7134.918145] RAX: 0000000000000000 RBX: ffff88878e2e6b40 RCX: 0000000000000007
[ 7134.918255] RDX: 0000000000000001 RSI: ffffc90009f1bb54 RDI: ffff888748f49858
[ 7134.918365] RBP: ffff888748f49858 R08: 0000000000000001 R09: 0000000000036d00
[ 7134.918475] R10: 0000000000000000 R11: 0000000000000068 R12: ffff888748f49858
[ 7134.918585] R13: ffff88878e2e6c78 R14: ffff88878e2e6b40 R15: 00000000000001fc
[ 7134.918699] FS:  00007fdfa39b72c0(0000) GS:ffff88882b140000(0000)
knlGS:0000000000000000
[ 7134.918843] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7134.918950] CR2: 0000559be7639038 CR3: 00000007a5a72000 CR4: 0000000000340ee0
...


--Marc


>
>
> Thanks for your time.
>
> --Marc



More information about the dm-devel mailing list