[dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018

Mikulas Patocka mpatocka at redhat.com
Thu Nov 3 15:20:43 UTC 2022



On Thu, 3 Nov 2022, Mikulas Patocka wrote:

> > BTW, is the mempool_free from endio -> dec_count -> complete_io?
> > And io which caused the crash is from dm_io -> async_io / sync_io
> >  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> > suppose the io is for mirror image.
> > 
> > Thanks,
> > Guoqing
> 
> I presume that the bug is caused by destruction of a bio set while bio 
> from that set was in progress. When the bio finishes and an attempt is 
> made to free the bio, a crash happens when the code tries to free the bio 
> into a destroyed mempool.
> 
> I can do more testing to validate this theory.
> 
> Mikulas

When I disable tail-call optimizations with "-fno-optimize-sibling-calls", 
I get this stacktrace:

[  200.105367] Call Trace:
[  200.105611]  <TASK>
[  200.105825]  dump_stack_lvl+0x33/0x42
[  200.106196]  dump_stack+0xc/0xd
[  200.106516]  mempool_free.cold+0x22/0x32
[  200.106921]  bio_free+0x49/0x60
[  200.107239]  bio_put+0x95/0x100
[  200.107567]  super_written+0x4f/0x120 [md_mod]
[  200.108020]  bio_endio+0xe8/0x100
[  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
[  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
[  200.109288]  bio_endio+0xe8/0x100
[  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
[  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]
[  200.110543]  bio_endio+0xe8/0x100
[  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
[  200.111310]  __submit_bio+0x7a/0x120
[  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
[  200.112138]  submit_bio_noacct+0x12e/0x3e0
[  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
[  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]
[  200.113536]  process_one_work+0x1b4/0x320
[  200.113943]  worker_thread+0x45/0x3e0
[  200.114319]  ? rescuer_thread+0x380/0x380
[  200.114714]  kthread+0xc2/0x100
[  200.115035]  ? kthread_complete_and_exit+0x20/0x20
[  200.115517]  ret_from_fork+0x1f/0x30
[  200.115874]  </TASK>

The function super_written is obviously buggy, because it first wakes up a 
process and then calls bio_put(bio) - so the woken-up process is racing 
with bio_put(bio) and the result is that we attempt to free a bio into a 
destroyed bio set.

When I fix super_written, there are no longer any crashes. I'm posting a 
patch in the next email.

Mikulas


More information about the dm-devel mailing list