[dm-devel] dm-crypt: fix softlockup in dmcrypt_write
yangerkun
yangerkun at huaweicloud.com
Tue Feb 28 01:40:56 UTC 2023
在 2023/2/28 2:06, Mike Snitzer 写道:
> On Mon, Feb 27 2023 at 1:03P -0500,
> Mike Snitzer <snitzer at kernel.org> wrote:
>
>> On Mon, Feb 27 2023 at 12:55P -0500,
>> Mike Snitzer <snitzer at kernel.org> wrote:
>>
>>> On Sun, Feb 26 2023 at 8:31P -0500,
>>> yangerkun <yangerkun at huaweicloud.com> wrote:
>>>
>>>>
>>>>
>>>> 在 2023/2/26 10:01, Bart Van Assche 写道:
>>>>> On 2/22/23 19:19, yangerkun wrote:
>>>>>> @@ -1924,6 +1926,10 @@ static int dmcrypt_write(void *data)
>>>>>> BUG_ON(rb_parent(write_tree.rb_node));
>>>>>> + if (time_is_before_jiffies(start_time + HZ)) {
>>>>>> + schedule();
>>>>>> + start_time = jiffies;
>>>>>> + }
>>>>>
>>>>> Why schedule() instead of cond_resched()?
>>>>
>>>> cond_resched may not really schedule, which may trigger the problem too, but
>>>> it seems after 1 second, it may never happend?
>>>
>>> I had the same question as Bart when reviewing your homegrown
>>> conditional schedule(). Hopefully you can reproduce this issue? If
>>> so, please see if simply using cond_resched() fixes the issue.
>>
>> This seems like a more appropriate patch:
>>
>> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
>> index 87c5706131f2..faba1be572f9 100644
>> --- a/drivers/md/dm-crypt.c
>> +++ b/drivers/md/dm-crypt.c
>> @@ -1937,6 +1937,7 @@ static int dmcrypt_write(void *data)
>> io = crypt_io_from_node(rb_first(&write_tree));
>> rb_erase(&io->rb_node, &write_tree);
>> kcryptd_io_write(io);
>> + cond_resched();
>> } while (!RB_EMPTY_ROOT(&write_tree));
>> blk_finish_plug(&plug);
>> }
>
>
> or:
>
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 87c5706131f2..3ba2fd3e4358 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
> */
> blk_start_plug(&plug);
> do {
> + cond_resched();
> io = crypt_io_from_node(rb_first(&write_tree));
> rb_erase(&io->rb_node, &write_tree);
> kcryptd_io_write(io);
Hi,
Thanks a lot for your review!
It's ok to fix the softlockup, but for async write encrypt,
kcryptd_crypt_write_io_submit will add bio to write_tree, and once we
call cond_resched before every kcryptd_io_write, the write performance
may be poor while we meet a high cpu usage scene.
kcryptd_crypt_write_io_submit will wakeup write_thread once there is a
empty write_tree, and dmcrypt_write will peel the old write_tree to
submit bio, so there can not exist too many bio in write_tree. Then I
choose yield cpu before the 'while' that submit bio...
Thanks,
Kun.
More information about the dm-devel
mailing list