[dm-devel] A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018
Guoqing Jiang
guoqing.jiang at linux.dev
Thu Nov 3 07:28:55 UTC 2022
On 11/3/22 11:47 AM, Guoqing Jiang wrote:
>> [ 78.491429] <TASK>
>> [ 78.491640] clone_endio+0xf4/0x1c0 [dm_mod]
>> [ 78.492072] clone_endio+0xf4/0x1c0 [dm_mod]
>
> The clone_endio belongs to "clone" target_type.
Hmm, could be the "clone_endio" from dm.c instead of dm-clone-target.c.
>
>> [ 78.492505] __submit_bio+0x76/0x120
>> [ 78.492859] submit_bio_noacct_nocheck+0xb6/0x2a0
>> [ 78.493325] flush_expired_bios+0x28/0x2f [dm_delay]
>
> This is "delay" target_type. Could you shed light on how the two targets
> connect with dm-raid? And I have shallow knowledge about dm ...
>
>> [ 78.493808] process_one_work+0x1b4/0x300
>> [ 78.494211] worker_thread+0x45/0x3e0
>> [ 78.494570] ? rescuer_thread+0x380/0x380
>> [ 78.494957] kthread+0xc2/0x100
>> [ 78.495279] ? kthread_complete_and_exit+0x20/0x20
>> [ 78.495743] ret_from_fork+0x1f/0x30
>> [ 78.496096] </TASK>
>> [ 78.496326] Modules linked in: brd dm_delay dm_raid dm_mod
>> af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev
>> tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon
>> rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button
>> mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy
>> async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod
>> t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg
>> scsi_common [last unloaded: brd]
>> [ 78.500425] CR2: 0000000000000000
>> [ 78.500752] ---[ end trace 0000000000000000 ]---
>> [ 78.501214] RIP: 0010:mempool_free+0x47/0x80
>
> BTW, is the mempool_free from endio -> dec_count -> complete_io?
I guess it is "mempool_free(io, &io->client->pool)", and the pool is
freed by
dm_io_client_destroy, and seems dm-raid is not responsible for either create
pool or destroy pool.
> And io which caused the crash is from dm_io -> async_io / sync_io
> -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> suppose the io is for mirror image.
The io should be from another path (dm_submit_bio ->
dm_split_and_process_bio
-> __split_and_process_bio -> __map_bio which sets "bi_end_io =
clone_endio").
My guess is, there is racy condition between "lvchange --rebuild" and
raid_dtr since
it was reproduced by running cmd in loop.
Anyway, we can revert the mentioned commit and go back to Neil's
solution [1],
but I'd like to reproduce it and learn DM a bit.
[1].
https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470
Thanks,
Guoqing
More information about the dm-devel
mailing list