[dm-devel] dm_cache BUG() after power loss

Mike Snitzer snitzer at redhat.com
Fri Oct 19 13:03:50 UTC 2018


On Tue, Oct 09 2018 at 12:00pm -0400,
Steinar H. Gunderson <steinar+kernel at gunderson.no> wrote:

> Hi,
> 
> We had a power loss event, and when a server with dm-cache came up again,
> it paniced (see below for the panic text). I couldn't find any other way to
> remedy this than to blow away the metadata volume, which I assumed was safe
> as the cache is in writethrough mode (after several catastrophic events with
> dm-cache earlier, I don't trust writeback anymore). Unfortunately, this was
> seemingly not enough, as the underlying devices came back with various levels
> of corruption and eventually had to be restored from backup. (It's running
> without dm-cache now.)

Please provide the "dmsetup table" line for the cache device if you can.

Are you using writeback mode?

There was a writeback bug that got fixed not too long ago that impacted
users who suffered power loss (or sudden loss of storage), see:
http://git.kernel.org/linus/5b1fe7bec8a8

BUT, it does look like 4.18.11 already has that commit.

Given the "block manager: array validator check failed for block 2156"
error it could easily be that you need to run cache_check and
cache_repair.  Joe (cc'd) may have more specific repair guidance for
you (though Joe is going on vacation.. bad timing).

Mike


> Here's the panic:
> 
> [   13.388089] device-mapper: cache: You have created a cache device with a lot of individual cache blocks (1114672)
> [   13.388089] All these mappings can consume a lot of kernel memory, and take some time to read/write.
> [   13.388089] Please consider increasing the cache block size to reduce the overall cache block count.
> [   13.452782] device-mapper: array: array_block_check failed: blocknr 1082331758718 != wanted 2156
> [   13.462194] device-mapper: block manager: array validator check failed for block 2156
> [   13.470643] device-mapper: array: get_ablock failed
> [   13.475869] device-mapper: cache metadata: dm_array_cursor_next for mapping failed
> [   13.484075] ------------[ cut here ]------------
> [   13.489036] kernel BUG at drivers/md/dm-bufio.c:1180!
> [   13.494443] invalid opcode: 0000 [#1] SMP PTI
> [   13.499144] CPU: 34 PID: 5918 Comm: dmsetup Not tainted 4.18.11 #1
> [   13.505671] Hardware name: Supermicro X10DRi/X10DRI-T, BIOS 1.1 04/14/2015
> [   13.512885] RIP: 0010:dm_bufio_release+0x18/0x74 [dm_bufio]
> [   13.518797] Code: 43 18 48 b8 00 02 00 00 00 00 ad de 48 89 43 20 5b c3 55 53 48 8b 6f 60 48 89 fb 48 89 ef e8 04 7e 20 e1 8b 43 3c 85 c0 75 02 <0f> 0b ff c8 85 c0 89 43 3c 75 47 31 c9 ba 01 00 00 00 be 03 00 00
> [   13.538588] RSP: 0018:ffffc90000a03ba0 EFLAGS: 00010246
> [   13.544157] RAX: 0000000000000000 RBX: ffff881025a6cea0 RCX: 00000000ffffffff
> [   13.551632] RDX: ffff8810303a3980 RSI: ffff881025a6cea0 RDI: ffff881032b37800
> [   13.559110] RBP: ffff881032b37800 R08: 0000000000000000 R09: ffff8800000b8c80
> [   13.566591] R10: ffffc90000a03b00 R11: ffffffff82194947 R12: 0000000000000000
> [   13.574071] R13: ffff881027631340 R14: 00000000000011e4 R15: ffffffffa0127443
> [   13.581550] FS:  00007f3cd0894400(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
> [   13.590256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   13.596337] CR2: 0000563e475af7a8 CR3: 000000102f802006 CR4: 00000000001606e0
> [   13.603812] Call Trace:
> [   13.606614]  dm_array_cursor_end+0x1c/0x27 [dm_persistent_data]
> [   13.612880]  dm_cache_load_mappings+0x2be/0x2fe [dm_cache]
> [   13.618717]  ? retrieve_status+0x176/0x176 [dm_mod]
> [   13.623935]  cache_preresume+0xc6/0x195 [dm_cache]
> [   13.629067]  dm_table_resume_targets+0x38/0xaa [dm_mod]
> [   13.634641]  dm_resume+0x7e/0xa7 [dm_mod]
> [   13.639000]  dev_suspend+0x15b/0x1bc [dm_mod]
> [   13.643706]  ctl_ioctl+0x2f8/0x394 [dm_mod]
> [   13.648238]  dm_ctl_ioctl+0x5/0x8 [dm_mod]
> [   13.652681]  vfs_ioctl+0x19/0x26
> [   13.656247]  do_vfs_ioctl+0x4d0/0x547
> [   13.660255]  ? handle_mm_fault+0x151/0x1b9
> [   13.664695]  ksys_ioctl+0x4b/0x6b
> [   13.668355]  __x64_sys_ioctl+0x11/0x14
> [   13.672446]  do_syscall_64+0x4a/0xd3
> [   13.676365]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   13.681758] RIP: 0033:0x7f3ccff7edd7
> [   13.685670] Code: 00 00 00 48 8b 05 c1 80 2b 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 80 2b 00 f7 d8 64 89 01 48
> [   13.705454] RSP: 002b:00007ffcdf6a6f18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [   13.713640] RAX: ffffffffffffffda RBX: 000055f036494280 RCX: 00007f3ccff7edd7
> [   13.721117] RDX: 000055f036494280 RSI: 00000000c138fd06 RDI: 0000000000000003
> [   13.728595] RBP: 000000000000000f R08: 00007f3cd048a648 R09: 00007ffcdf6a6d80
> [   13.736073] R10: 00007f3cd0489b53 R11: 0000000000000246 R12: 000055f0364942b0
> [   13.743553] R13: 00007f3cd0489b53 R14: 000055f036493030 R15: 0000000000000001
> [   13.751034] Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid1 raid10 raid6_pq raid0 md_mod sd_mod usbhid dm_cache_smq dm_cache dm_bio_prison dm_persistent_data dm_bufio dm_mod libcrc32c crc32c_generic ixgbe i2c_i801 mdio ehci_pci crc32c_intel mpt3sas ahci ptp raid_class i2c_core ehci_hcd libahci pps_core unix
> [   13.783877] ---[ end trace 1140618cbf25a884 ]---
> [   13.792516] RIP: 0010:dm_bufio_release+0x18/0x74 [dm_bufio]
> [   13.798431] Code: 43 18 48 b8 00 02 00 00 00 00 ad de 48 89 43 20 5b c3 55 53 48 8b 6f 60 48 89 fb 48 89 ef e8 04 7e 20 e1 8b 43 3c 85 c0 75 02 <0f> 0b ff c8 85 c0 89 43 3c 75 47 31 c9 ba 01 00 00 00 be 03 00 00
> [   13.818221] RSP: 0018:ffffc90000a03ba0 EFLAGS: 00010246
> [   13.823787] RAX: 0000000000000000 RBX: ffff881025a6cea0 RCX: 00000000ffffffff
> [   13.831260] RDX: ffff8810303a3980 RSI: ffff881025a6cea0 RDI: ffff881032b37800
> [   13.838742] RBP: ffff881032b37800 R08: 0000000000000000 R09: ffff8800000b8c80
> [   13.846220] R10: ffffc90000a03b00 R11: ffffffff82194947 R12: 0000000000000000
> [   13.853701] R13: ffff881027631340 R14: 00000000000011e4 R15: ffffffffa0127443
> [   13.861182] FS:  00007f3cd0894400(0000) GS:ffff88103f480000(0000) knlGS:0000000000000000
> [   13.869889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   13.875988] CR2: 0000563e475af7a8 CR3: 000000102f802006 CR4: 00000000001606e0
> 
> /* Steinar */
> -- 
> Homepage: https://www.sesse.net/
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel




More information about the dm-devel mailing list