[dm-devel] [PATCH -next v2 00/28] md: synchronize io with array reconfiguration

Yu Kuai yukuai1 at huaweicloud.com
Tue Sep 26 00:55:53 UTC 2023


Hi,

在 2023/09/25 23:45, Song Liu 写道:
> Hi Kuai,
> 
> Thanks for the patchset!
> 
> I have got the following panic with mdadm test 23rdev-lifetime.
> Could you please look into it?
> 
> I pushed the test code to this branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-test-28

Thanks for the test, I know where the problem is now, mddev is
dereferenced before the null checking.

I'll fix this in the next version.

Thanks,
Kuai

> 
> Thanks,
> Song
> 
> 
> [  173.143010] ==================================================================
> [  173.144256] BUG: KASAN: null-ptr-deref in __mutex_lock+0xc0/0x920
> [  173.145232] Read of size 8 at addr 00000000000000a8 by task test/1215
> [  173.146138]
> [  173.146375] CPU: 26 PID: 1215 Comm: test Not tainted 6.6.0-rc2+ #8
> [  173.147254] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [  173.148840] Call Trace:
> [  173.149202]  <TASK>
> [  173.149531]  dump_stack_lvl+0xb5/0x100
> [  173.150093]  ? __pfx_dump_stack_lvl+0x10/0x10
> [  173.150724]  ? _printk+0xac/0xf0
> [  173.151251]  ? lock_acquired+0xff/0x680
> [  173.151852]  print_report+0xe6/0x510
> [  173.152372]  ? __might_resched+0x1a1/0x3d0
> [  173.152997]  ? __mutex_lock+0xc0/0x920
> [  173.153566]  kasan_report+0x119/0x150
> [  173.154114]  ? lock_acquire+0x18a/0x390
> [  173.154667]  ? __mutex_lock+0xc0/0x920
> [  173.155225]  ? mddev_suspend+0xbc/0x260
> [  173.155799]  __mutex_lock+0xc0/0x920
> [  173.156332]  ? lock_acquire+0x18a/0x390
> [  173.156928]  ? kernfs_find_and_get_ns+0x4c/0xb0
> [  173.157578]  ? __pfx___mutex_lock+0x10/0x10
> [  173.158177]  ? down_read+0x6b2/0x800
> [  173.158696]  ? lock_is_held_type+0xdb/0x150
> [  173.159300]  mddev_suspend+0xbc/0x260
> [  173.159832]  ? __pfx_lock_release+0x10/0x10
> [  173.160427]  ? lock_is_held_type+0xdb/0x150
> [  173.161074]  ? __pfx_mddev_suspend+0x10/0x10
> [  173.161698]  rdev_attr_store+0x5ba/0x600
> [  173.162282]  ? __pfx_sysfs_kf_write+0x10/0x10
> [  173.162915]  kernfs_fop_write_iter+0x1d1/0x280
> [  173.163595]  vfs_write+0x45d/0x5d0
> [  173.164113]  ? __pfx_vfs_write+0x10/0x10
> [  173.164709]  ? __pfx_lock_release+0x10/0x10
> [  173.165352]  ksys_write+0xed/0x1a0
> [  173.165912]  ? __pfx_ksys_write+0x10/0x10
> [  173.166501]  ? __audit_syscall_entry+0x1cf/0x200
> [  173.167191]  ? syscall_enter_from_user_mode+0x181/0x220
> [  173.168034]  do_syscall_64+0x43/0x90
> [  173.168588]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  173.169355] RIP: 0033:0x7f4e65ced648
> [  173.169830] md: could not open device unknown-block(7,0).
> [  173.169914] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00
> 00 00 f3 0f 1e fa 48 8d 05 55 6f 2d 00 8b 00 85 c0 75 17 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89
> d4 55
> [  173.173324] RSP: 002b:00007ffe9a2ac128 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  173.174398] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f4e65ced648
> [  173.175405] RDX: 0000000000000007 RSI: 0000561ae26e29d0 RDI: 0000000000000001
> [  173.176416] RBP: 0000561ae26e29d0 R08: 000000000000000a R09: 00007f4e65d80620
> [  173.177417] R10: 000000000000000a R11: 0000000000000246 R12: 00007f4e65fc06e0
> [  173.178418] R13: 0000000000000007 R14: 00007f4e65fbb880 R15: 0000000000000007
> [  173.179441]  </TASK>
> [  173.179775] ==================================================================
> [  173.180838] Disabling lock debugging due to kernel taint
> [  173.181662] BUG: kernel NULL pointer dereference, address: 00000000000000a8
> [  173.182654] #PF: supervisor read access in kernel mode
> [  173.183408] #PF: error_code(0x0000) - not-present page
> [  173.184152] PGD 0 P4D 0
> [  173.184531] Oops: 0000 [#1] PREEMPT SMP KASAN PTI
> [  173.185224] CPU: 26 PID: 1215 Comm: test Tainted: G    B
>    6.6.0-rc2+ #8
> [  173.186320] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> [  173.187912] RIP: 0010:__mutex_lock+0xc0/0x920
> [  173.188557] Code: 00 e8 24 f3 77 fe 2e 2e 2e 31 c0 48 c7 c7 80 c7
> c5 89 e8 03 01 bf fe 83 3d ec e0 27 07 00 75 15 49 8d 7c 24 68 e8 30
> 02 bf fe <4d> 39 64 24 68 0f 85 00 08 00 00 bf 01 00 00 00 e8 5b e7 76
> fe 4d
> [  173.191203] RSP: 0018:ffff8881b18c7a20 EFLAGS: 00010286
> [  173.191958] RAX: ffff8881b0ae4001 RBX: 0000000000000000 RCX: ffffffff810e0df1
> [  173.192968] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8900ea40
> [  173.193976] RBP: ffff8881b18c7b50 R08: ffffffff8900ea47 R09: 1ffffffff1201d48
> [  173.194986] R10: dffffc0000000000 R11: fffffbfff1201d49 R12: 0000000000000040
> [  173.196263] R13: ffffffff823e61cc R14: 0000000000000000 R15: 0000000000000000
> [  173.197274] FS:  00007f4e66b6e740(0000) GS:ffff888dfd200000(0000)
> knlGS:0000000000000000
> [  173.198466] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  173.199316] CR2: 00000000000000a8 CR3: 00000001b191e005 CR4: 0000000000370ee0
> [  173.200327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  173.201382] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  173.202430] Call Trace:
> [  173.202810]  <TASK>
> [  173.203173]  ? __die_body+0x63/0xb0
> [  173.203678]  ? page_fault_oops+0x2f3/0x440
> [  173.204338]  ? __pfx_page_fault_oops+0x10/0x10
> [  173.204981]  ? vprintk_emit+0x455/0x520
> [  173.205593]  ? __pfx_vprintk_emit+0x10/0x10
> [  173.206276]  ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
> [  173.207068]  ? do_user_addr_fault+0x796/0x840
> [  173.207694]  ? _printk+0xac/0xf0
> [  173.208188]  ? __pfx_do_user_addr_fault+0x10/0x10
> [  173.208879]  ? rcu_is_watching+0x30/0x60
> [  173.209475]  ? exc_page_fault+0x7d/0x290
> [  173.210043]  ? asm_exc_page_fault+0x22/0x30
> [  173.210639]  ? mddev_suspend+0xbc/0x260
> [  173.211294]  ? add_taint+0x41/0x90
> [  173.211798]  ? __mutex_lock+0xc0/0x920
> [  173.212352]  ? lock_acquire+0x18a/0x390
> [  173.212914]  ? kernfs_find_and_get_ns+0x4c/0xb0
> [  173.213623]  ? __pfx___mutex_lock+0x10/0x10
> [  173.214243]  ? down_read+0x6b2/0x800
> [  173.214773]  ? lock_is_held_type+0xdb/0x150
> [  173.215374]  mddev_suspend+0xbc/0x260
> [  173.215941]  ? __pfx_lock_release+0x10/0x10
> [  173.216541]  ? lock_is_held_type+0xdb/0x150
> [  173.217148]  ? __pfx_mddev_suspend+0x10/0x10
> [  173.217776]  rdev_attr_store+0x5ba/0x600
> [  173.218343]  ? __pfx_sysfs_kf_write+0x10/0x10
> [  173.218977]  kernfs_fop_write_iter+0x1d1/0x280
> [  173.219618]  vfs_write+0x45d/0x5d0
> [  173.220126]  ? __pfx_vfs_write+0x10/0x10
> [  173.220689]  ? __pfx_lock_release+0x10/0x10
> [  173.221342]  ksys_write+0xed/0x1a0
> [  173.221850]  ? __pfx_ksys_write+0x10/0x10
> [  173.222421]  ? __audit_syscall_entry+0x1cf/0x200
> [  173.223090]  ? syscall_enter_from_user_mode+0x181/0x220
> [  173.223845]  do_syscall_64+0x43/0x90
> [  173.224362]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [  173.225083] RIP: 0033:0x7f4e65ced648
> [  173.225599] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00
> 00 00 f3 0f 1e fa 48 8d 05 55 6f 2d 00 8b 00 85 c0 75 17 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89
> d4 55
> [  173.228199] RSP: 002b:00007ffe9a2ac128 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  173.229267] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f4e65ced648
> [  173.230273] RDX: 0000000000000007 RSI: 0000561ae26e29d0 RDI: 0000000000000001
> [  173.231274] RBP: 0000561ae26e29d0 R08: 000000000000000a R09: 00007f4e65d80620
> [  173.232323] R10: 000000000000000a R11: 0000000000000246 R12: 00007f4e65fc06e0
> [  173.233323] R13: 0000000000000007 R14: 00007f4e65fbb880 R15: 0000000000000007
> [  173.234333]  </TASK>
> [  173.234657] Modules linked in:
> [  173.235118] CR2: 00000000000000a8
> [  173.235601] ---[ end trace 0000000000000000 ]---
> [  173.236270] RIP: 0010:__mutex_lock+0xc0/0x920
> [  173.236906] Code: 00 e8 24 f3 77 fe 2e 2e 2e 31 c0 48 c7 c7 80 c7
> c5 89 e8 03 01 bf fe 83 3d ec e0 27 07 00 75 15 49 8d 7c 24 68 e8 30
> 02 bf fe <4d> 39 64 24 68 0f 85 00 08 00 00 bf 01 00 00 00 e8 5b e7 76
> fe 4d
> [  173.239538] RSP: 0018:ffff8881b18c7a20 EFLAGS: 00010286
> [  173.240286] RAX: ffff8881b0ae4001 RBX: 0000000000000000 RCX: ffffffff810e0df1
> [  173.241293] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff8900ea40
> [  173.242342] RBP: ffff8881b18c7b50 R08: ffffffff8900ea47 R09: 1ffffffff1201d48
> [  173.243343] R10: dffffc0000000000 R11: fffffbfff1201d49 R12: 0000000000000040
> [  173.244346] R13: ffffffff823e61cc R14: 0000000000000000 R15: 0000000000000000
> [  173.245384] FS:  00007f4e66b6e740(0000) GS:ffff888dfd200000(0000)
> knlGS:0000000000000000
> [  173.246548] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  173.247362] CR2: 00000000000000a8 CR3: 00000001b191e005 CR4: 0000000000370ee0
> [  173.248371] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  173.249390] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  173.250395] Kernel panic - not syncing: Fatal exception
> [  173.251612] Kernel Offset: disabled
> [  173.252133] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> 
> On Sun, Aug 27, 2023 at 7:04 PM Yu Kuai <yukuai1 at huaweicloud.com> wrote:
>>
>> From: Yu Kuai <yukuai3 at huawei.com>
>>
>> Changes in v2:
>>   - rebase with latest md-next
>>   - remove some follow up cleanup patches, these patches will be sent
>>   later after this patchset.
>>
>> After previous four patchset of preparatory work, this patchset impelement
>> a new version of mddev_suspend(), the new apis:
>>   - reconfig_mutex is not required;
>>   - the weird logical that suspend array hold 'reconfig_mutex' for
>>     mddev_check_recovery() to update superblock is not needed;
>>   - the special handling, 'pers->prepare_suspend', for raid456 is not
>>     needed;
>>   - It's safe to be called at any time once mddev is allocated, and it's
>>     designed to be used from slow path where array configuration is changed;
>>
>> And use the new api to replace:
>>
>> mddev_lock
>> mddev_suspend or not
>> // array reconfiguration
>> mddev_resume or not
>> mddev_unlock
>>
>> With:
>>
>> mddev_suspend
>> mddev_lock
>> // array reconfiguration
>> mddev_unlock
>> mddev_resume
>>
>> However, the above change is not possible for raid5 and raid-cluster in
>> some corner cases, and mddev_suspend/resume() is replaced with quiesce()
>> callback, which will suspend the array as well.
>>
>> This patchset is tested in my VM with mdadm testsuite with loop device
>> except for 10ddf tests(they always fail before this patchset).
>>
>> A lot of cleanups will be started after this patchset.
>>
>> Yu Kuai (28):
>>    md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
>>    md: use 'mddev->suspended' for is_md_suspended()
>>    md: add new helpers to suspend/resume array
>>    md: add new helpers to suspend/resume and lock/unlock array
>>    md: use new apis to suspend array for suspend_lo/hi_store()
>>    md: use new apis to suspend array for level_store()
>>    md: use new apis to suspend array for serialize_policy_store()
>>    md/dm-raid: use new apis to suspend array
>>    md/md-bitmap: use new apis to suspend array for location_store()
>>    md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
>>    md/raid5-cache: use new apis to suspend array for
>>      r5c_disable_writeback_async()
>>    md/raid5-cache: use new apis to suspend array for
>>      r5c_journal_mode_store()
>>    md/raid5: use new apis to suspend array for raid5_store_stripe_size()
>>    md/raid5: use new apis to suspend array for raid5_store_skip_copy()
>>    md/raid5: use new apis to suspend array for
>>      raid5_store_group_thread_cnt()
>>    md/raid5: use new apis to suspend array for
>>      raid5_change_consistency_policy()
>>    md/raid5: replace suspend with quiesce() callback
>>    md: quiesce before md_kick_rdev_from_array() for md-cluster
>>    md: use new apis to suspend array for ioctls involed array
>>      reconfiguration
>>    md: use new apis to suspend array for adding/removing rdev from
>>      state_store()
>>    md: use new apis to suspend array for bind_rdev_to_array()
>>    md: use new apis to suspend array related to serial pool in
>>      state_store()
>>    md: use new apis to suspend array in backlog_store()
>>    md: suspend array in md_start_sync() if array need reconfiguration
>>    md: cleanup mddev_create/destroy_serial_pool()
>>    md/md-linear: cleanup linear_add()
>>    md: remove old apis to suspend the array
>>    md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
>>
>>   drivers/md/dm-raid.c       |   8 +-
>>   drivers/md/md-autodetect.c |   4 +-
>>   drivers/md/md-bitmap.c     |  18 ++-
>>   drivers/md/md-linear.c     |   2 -
>>   drivers/md/md.c            | 250 ++++++++++++++++++++++---------------
>>   drivers/md/md.h            |  52 ++++++--
>>   drivers/md/raid5-cache.c   |  61 +++++----
>>   drivers/md/raid5.c         |  56 ++++-----
>>   8 files changed, 253 insertions(+), 198 deletions(-)
>>
>> --
>> 2.39.2
>>
> .
> 



More information about the dm-devel mailing list