[dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018

Guoqing Jiang guoqing.jiang at linux.dev
Thu Nov 3 07:28:55 UTC 2022



On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>> [   78.491429] <TASK>
>> [   78.491640]  clone_endio+0xf4/0x1c0 [dm_mod]
>> [   78.492072]  clone_endio+0xf4/0x1c0 [dm_mod]
>
> The clone_endio belongs to "clone" target_type.

Hmm, could be the "clone_endio" from dm.c instead of dm-clone-target.c.

>
>> [   78.492505] __submit_bio+0x76/0x120
>> [   78.492859]  submit_bio_noacct_nocheck+0xb6/0x2a0
>> [   78.493325]  flush_expired_bios+0x28/0x2f [dm_delay]
>
> This is "delay" target_type. Could you shed light on how the two targets
> connect with dm-raid? And I have shallow knowledge about dm ...
>
>> [   78.493808] process_one_work+0x1b4/0x300
>> [   78.494211]  worker_thread+0x45/0x3e0
>> [   78.494570]  ? rescuer_thread+0x380/0x380
>> [   78.494957]  kthread+0xc2/0x100
>> [   78.495279]  ? kthread_complete_and_exit+0x20/0x20
>> [   78.495743]  ret_from_fork+0x1f/0x30
>> [   78.496096]  </TASK>
>> [   78.496326] Modules linked in: brd dm_delay dm_raid dm_mod 
>> af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev 
>> tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon 
>> rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button 
>> mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy 
>> async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod 
>> t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg 
>> scsi_common [last unloaded: brd]
>> [   78.500425] CR2: 0000000000000000
>> [   78.500752] ---[ end trace 0000000000000000 ]---
>> [   78.501214] RIP: 0010:mempool_free+0x47/0x80
>
> BTW, is the mempool_free from endio -> dec_count -> complete_io?

I guess it is "mempool_free(io, &io->client->pool)", and the pool is 
freed by
dm_io_client_destroy, and seems dm-raid is not responsible for either create
pool or destroy pool.

> And io which caused the crash is from dm_io -> async_io / sync_io
>  -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> suppose the io is for mirror image. 

The io should be from another path (dm_submit_bio -> 
dm_split_and_process_bio
-> __split_and_process_bio -> __map_bio which sets "bi_end_io = 
clone_endio").

My guess is, there is racy condition between "lvchange --rebuild" and 
raid_dtr since
it was reproduced by running cmd in loop.

Anyway, we can revert the mentioned commit and go back to Neil's 
solution [1],
but I'd like to reproduce it and learn DM a bit.

[1]. 
https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470

Thanks,
Guoqing



More information about the dm-devel mailing list