[dm-devel] bioset_exit poison from dm_destroy
Mike Snitzer
snitzer at redhat.com
Tue May 31 19:49:40 UTC 2022
On Tue, May 31 2022 at 3:00P -0400,
Jens Axboe <axboe at kernel.dk> wrote:
> On 5/31/22 12:58 PM, Mike Snitzer wrote:
> > On Sun, May 29 2022 at 8:46P -0400,
> > Jens Axboe <axboe at kernel.dk> wrote:
> >
> >> On 5/28/22 6:17 PM, Matthew Wilcox wrote:
> >>> Not quite sure whose bug this is. Current Linus head running xfstests
> >>> against ext4 (probably not ext4's fault?)
> >>>
> >>> 01818 generic/250 run fstests generic/250 at 2022-05-28 23:48:09
> >>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> >>> 01818 EXT4-fs (dm-0): unmounting filesystem.
> >>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> >>> 01818 EXT4-fs (dm-0): unmounting filesystem.
> >>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> >>> 01818 Buffer I/O error on dev dm-0, logical block 3670000, async page read
> >>> 01818 EXT4-fs (dm-0): unmounting filesystem.
> >>> 01818 EXT4-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: none.
> >>> 01818 EXT4-fs (dm-0): unmounting filesystem.
> >>> 01818 general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] PREEMPT SMP NOPTI
> >>> 01818 CPU: 0 PID: 1579117 Comm: dmsetup Kdump: loaded Not tainted 5.18.0-11049-g1dec3d7fd0c3-dirty #262
> >>> 01818 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
> >>> 01818 RIP: 0010:__cpuhp_state_remove_instance+0xf0/0x1b0
> >>> 01818 Code: a0 f8 d7 81 42 3b 1c 28 7f d9 4c 89 e1 31 d2 89 de 89 7d dc e8 01 fd ff ff 8b 7d dc eb c5 49 8b 04 24 49 8b 54 24 08 48 85 c0 <48> 89 02 74 04 48 89 50 08 48 b8 00 01 00 00 00 00 ad de 48 c7 c7
> >>> 01818 RSP: 0018:ffff888101fcfc60 EFLAGS: 00010286
> >>> 01818 RAX: dead000000000100 RBX: 0000000000000017 RCX: 0000000000000000
> >>> 01818 RDX: dead000000000122 RSI: ffff8881233b0ae8 RDI: ffffffff81e3b080
> >>> 01818 RBP: ffff888101fcfc88 R08: 0000000000000008 R09: 0000000000000003
> >>> 01818 R10: 0000000000000000 R11: 00000000002dc6c0 R12: ffff8881233b0ae8
> >>> 01818 R13: 0000000000000000 R14: ffffffff81e38f58 R15: ffff88817b5a3c00
> >>> 01818 FS: 00007ff56daec280(0000) GS:ffff888275800000(0000) knlGS:0000000000000000
> >>> 01818 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> 01818 CR2: 00005591ad94f198 CR3: 000000017b5a0004 CR4: 0000000000770eb0
> >>> 01818 PKRU: 55555554
> >>> 01818 Call Trace:
> >>> 01818 <TASK>
> >>> 01818 ? kfree+0x66/0x250
> >>> 01818 bioset_exit+0x32/0x210
> >>> 01818 cleanup_mapped_device+0x34/0xf0
> >>> 01818 __dm_destroy+0x149/0x1f0
> >>> 01818 ? table_clear+0xc0/0xc0
> >>> 01818 dm_destroy+0xe/0x10
> >>> 01818 dev_remove+0xd9/0x120
> >>> 01818 ctl_ioctl+0x1cb/0x420
> >>> 01818 dm_ctl_ioctl+0x9/0x10
> >>> 01818 __x64_sys_ioctl+0x89/0xb0
> >>> 01818 do_syscall_64+0x35/0x80
> >>> 01818 entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >>> 01818 RIP: 0033:0x7ff56de3b397
> >>> 01818 Code: 3c 1c e8 1c ff ff ff 85 c0 79 87 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a9 da 0d 00 f7 d8 64 89 01 48
> >>> 01818 RSP: 002b:00007ffe55367ef8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
> >>> 01818 RAX: ffffffffffffffda RBX: 00007ff56df31a8e RCX: 00007ff56de3b397
> >>> 01818 RDX: 000055daad7cab30 RSI: 00000000c138fd04 RDI: 0000000000000003
> >>> 01818 RBP: 00007ffe55367fb0 R08: 00007ff56df81558 R09: 00007ffe55367d60
> >>> 01818 R10: 00007ff56df808a2 R11: 0000000000000206 R12: 00007ff56df808a2
> >>> 01818 R13: 00007ff56df808a2 R14: 00007ff56df808a2 R15: 00007ff56df808a2
> >>> 01818 </TASK>
> >>> 01818 Modules linked in: crct10dif_generic crct10dif_common [last unloaded: crc_t10dif]
> >>
> >> I suspect dm is calling bioset_exit() multiple times? Which it probably
> >> should not.
> >>
> >> The reset of bioset_exit() is resilient against this, so might be best
> >> to include bio_alloc_cache_destroy() in that.
> >>
> >>
> >> diff --git a/block/bio.c b/block/bio.c
> >> index a3893d80dccc..be3937b84e68 100644
> >> --- a/block/bio.c
> >> +++ b/block/bio.c
> >> @@ -722,6 +722,7 @@ static void bio_alloc_cache_destroy(struct bio_set *bs)
> >> bio_alloc_cache_prune(cache, -1U);
> >> }
> >> free_percpu(bs->cache);
> >> + bs->cache = NULL;
> >> }
> >>
> >> /**
> >
> > Yes, we need the above to fix the crash. Does it also make sense to
> > add this?
> >
> > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > index 49eff01fb829..f410c78e9c0c 100644
> > --- a/include/linux/bio.h
> > +++ b/include/linux/bio.h
> > @@ -681,7 +681,7 @@ struct bio_set {
> >
> > static inline bool bioset_initialized(struct bio_set *bs)
> > {
> > - return bs->bio_slab != NULL;
> > + return (bs->bio_slab != NULL || bs->cache != NULL);
> > }
>
> Should not be possible to have valid bs->cache without bs->bio_slab?
Not quite following your question, but I sprinkled this extra check in
purely because DM core uses bioset_initialized() to verify that the
bioset is _not_ yet initialized.
But it really doesn't make sense to consider the bioset _fully_
initialized if bs->bio_slab is NULL but bs->cache is NOT...
Anyway, really don't need this change but we do need your earlier
patch to reset bs->cache to NULL.
Thanks.
More information about the dm-devel
mailing list