[dm-devel] [PATCH] dm-crypt: add the "high_priority" flag

yangerkun yangerkun at huawei.com
Sat Mar 11 09:26:48 UTC 2023



在 2023/3/7 9:30, yangerkun 写道:
> 
> 
> 在 2023/3/7 3:01, Mikulas Patocka 写道:
>>
>>
>> On Mon, 6 Mar 2023, Mikulas Patocka wrote:
>>
>>>> Hi,
>>>>
>>>> Thanks a lot for your review!
>>>>
>>>> It's ok to fix the softlockup, but for async write encrypt,
>>>> kcryptd_crypt_write_io_submit will add bio to write_tree, and once we
>>>> call cond_resched before every kcryptd_io_write, the write performance
>>>> may be poor while we meet a high cpu usage scene.
>>>
>>> Hi
>>>
>>> To fix this problem, find the PID of the process "dmcrypt_write" and
>>> change its priority to -20, for example "renice -n -20 -p 34748".
>>>
>>> This is the proper way how to fix it; locking up the process for one
>>> second is not.
>>>
>>> We used to have high-priority workqueues by default, but it caused audio
>>> playback skipping, so we had to revert it - see
>>> f612b2132db529feac4f965f28a1b9258ea7c22b.
>>>
>>> Perhaps we should add an option to have high-priority kernel threads?
>>>
>>> Mikulas
>>
>> Here I'm sending a patch that makes it optional to raise the priority
>> of dm-crypt workqueues and the write thread. Test it, if it helps for 
>> you.
> 
> Thanks, will test it!

Sorry for the delay, the scene that report the softlockup cannot been 
using(arm64 server), instead, we use a x86_64 server do the test.

Test step:
1. make every cpu busy, stress-ng -c 72 --timeout=360000
2. choose pmem as the disk, cryptsetup -c sm4-xts-plain64 --key-size=256 
--hash sm3 luksFormat --sector-size 4096 --pbkdf=pbkdf2 /dev/pmem0
3. taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64 
-rw=randwrite -ioengine=libaio -bs=4k  -runtime=60 -group_reporting 
-name=sqe_100write_4k -thread -numjobs=1

Results:

origin(without any patch):
[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test 
-direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k  -runtime=60 
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=189MiB/s][w=48.3k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40205: Sat Mar 11 
17:51:26 2023
   write: IOPS=60.3k, BW=236MiB/s (247MB/s)(13.8GiB/60001msec); 0 zone 
resets
     slat (usec): min=2, max=14393, avg= 4.78, stdev= 9.83
     clat (usec): min=26, max=44200, avg=1054.97, stdev=705.63
      lat (usec): min=31, max=44204, avg=1059.84, stdev=706.58
     clat percentiles (usec):
      |  1.00th=[  188],  5.00th=[  359], 10.00th=[  404], 20.00th=[  469],
      | 30.00th=[  611], 40.00th=[  717], 50.00th=[  914], 60.00th=[ 1172],
      | 70.00th=[ 1303], 80.00th=[ 1467], 90.00th=[ 1827], 95.00th=[ 2180],
      | 99.00th=[ 3294], 99.50th=[ 4146], 99.90th=[ 5800], 99.95th=[ 8225],
      | 99.99th=[14091]
    bw (  KiB/s): min=161560, max=486960, per=100.00%, avg=241961.65, 
stdev=65188.84, samples=119
    iops        : min=40390, max=121740, avg=60490.40, stdev=16297.21, 
samples=119
   lat (usec)   : 50=0.03%, 100=0.33%, 250=1.22%, 500=21.77%, 750=19.12%
   lat (usec)   : 1000=10.40%
   lat (msec)   : 2=40.23%, 4=6.34%, 10=0.53%, 20=0.03%, 50=0.01%
   cpu          : usr=16.16%, sys=29.07%, ctx=1459171, majf=0, minf=32412
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 
 >=64=100.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, 
 >=64=0.0%
      issued rwts: total=0,3620430,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s), 
io=13.8GiB (14.8GB), run=60001-60001msec

origin + schedule everytime:

[root at fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..17ddca293965 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
                         io = crypt_io_from_node(rb_first(&write_tree));
                         rb_erase(&io->rb_node, &write_tree);
                         kcryptd_io_write(io);
+                       cond_resched();
                 } while (!RB_EMPTY_ROOT(&write_tree));
                 blk_finish_plug(&plug);
         }


[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test 
-direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k  -runtime=60 
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=197MiB/s][w=50.4k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40287: Sat Mar 11 
17:56:25 2023
   write: IOPS=59.9k, BW=234MiB/s (245MB/s)(13.7GiB/60001msec); 0 zone 
resets
     slat (usec): min=2, max=13251, avg= 5.19, stdev= 9.59
     clat (usec): min=14, max=41490, avg=1062.91, stdev=655.31
      lat (usec): min=26, max=41495, avg=1068.20, stdev=656.78
     clat percentiles (usec):
      |  1.00th=[  269],  5.00th=[  363], 10.00th=[  412], 20.00th=[  494],
      | 30.00th=[  644], 40.00th=[  766], 50.00th=[  979], 60.00th=[ 1172],
      | 70.00th=[ 1303], 80.00th=[ 1450], 90.00th=[ 1827], 95.00th=[ 2147],
      | 99.00th=[ 3097], 99.50th=[ 3785], 99.90th=[ 5145], 99.95th=[ 5735],
      | 99.99th=[13698]
    bw (  KiB/s): min=171976, max=562936, per=100.00%, avg=239748.22, 
stdev=61262.50, samples=119
    iops        : min=42994, max=140734, avg=59937.04, stdev=15315.64, 
samples=119
   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.10%, 250=0.67%, 500=19.60%
   lat (usec)   : 750=18.69%, 1000=11.81%
   lat (msec)   : 2=42.65%, 4=6.05%, 10=0.40%, 20=0.02%, 50=0.01%
   cpu          : usr=15.94%, sys=28.99%, ctx=1564116, majf=0, minf=32437
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 
 >=64=100.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, 
 >=64=0.0%
      issued rwts: total=0,3592061,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   WRITE: bw=234MiB/s (245MB/s), 234MiB/s-234MiB/s (245MB/s-245MB/s), 
io=13.7GiB (14.7GB), run=60001-60001msec

origin + schedule everytime + set nice:

[root at fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..f2135c223944 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
                         io = crypt_io_from_node(rb_first(&write_tree));
                         rb_erase(&io->rb_node, &write_tree);
                         kcryptd_io_write(io);
+                       cond_resched();
                 } while (!RB_EMPTY_ROOT(&write_tree));
                 blk_finish_plug(&plug);
         }
@@ -3298,18 +3299,18 @@ static int crypt_ctr(struct dm_target *ti, 
unsigned int argc, char **argv)
         }

         ret = -ENOMEM;
-       cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 
1, devname);
+       cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM | 
WQ_HIGHPRI, 1, devname);
         if (!cc->io_queue) {
                 ti->error = "Couldn't create kcryptd io queue";
                 goto bad;
         }

         if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
-               cc->crypt_queue = alloc_workqueue("kcryptd/%s", 
WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM,
+               cc->crypt_queue = alloc_workqueue("kcryptd/%s", 
WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_HIGHPRI,
                                                   1, devname);
         else
                 cc->crypt_queue = alloc_workqueue("kcryptd/%s",
-                                                 WQ_CPU_INTENSIVE | 
WQ_MEM_RECLAIM | WQ_UNBOUND,
+                                                 WQ_CPU_INTENSIVE | 
WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_HIGHPRI,
                                                   num_online_cpus(), 
devname);
         if (!cc->crypt_queue) {
                 ti->error = "Couldn't create kcryptd queue";
@@ -3326,6 +3327,7 @@ static int crypt_ctr(struct dm_target *ti, 
unsigned int argc, char **argv)
                 ti->error = "Couldn't spawn write thread";
                 goto bad;
         }
+       set_user_nice(cc->write_thread, MIN_NICE);
         wake_up_process(cc->write_thread);

         ti->num_flush_bios = 1;


[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test 
-direct=1 -iodepth 64 -rw=write -ioengine=libaio -bs=4k  -runtime=60 
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, 
(T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [W(1)][100.0%][w=132MiB/s][w=33.7k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40483: Sat Mar 11 
18:02:21 2023
   write: IOPS=37.9k, BW=148MiB/s (155MB/s)(8884MiB/60001msec); 0 zone 
resets
     slat (usec): min=2, max=3105, avg=22.68, stdev=11.80
     clat (usec): min=18, max=46657, avg=1664.96, stdev=604.02
      lat (usec): min=39, max=46660, avg=1687.78, stdev=613.12
     clat percentiles (usec):
      |  1.00th=[  383],  5.00th=[  482], 10.00th=[  594], 20.00th=[  914],
      | 30.00th=[ 1696], 40.00th=[ 1860], 50.00th=[ 1909], 60.00th=[ 1958],
      | 70.00th=[ 1991], 80.00th=[ 2073], 90.00th=[ 2180], 95.00th=[ 2212],
      | 99.00th=[ 2376], 99.50th=[ 2442], 99.90th=[ 4359], 99.95th=[ 4883],
      | 99.99th=[12387]
    bw (  KiB/s): min=115368, max=363976, per=100.00%, avg=151823.13, 
stdev=44148.16, samples=119
    iops        : min=28842, max=90994, avg=37955.78, stdev=11037.04, 
samples=119
   lat (usec)   : 20=0.01%, 50=0.01%, 100=0.06%, 250=0.23%, 500=7.38%
   lat (usec)   : 750=4.65%, 1000=9.11%
   lat (msec)   : 2=49.16%, 4=29.29%, 10=0.10%, 20=0.01%, 50=0.01%
   cpu          : usr=8.86%, sys=18.57%, ctx=1878960, majf=0, minf=401
   IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, 
 >=64=100.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, 
 >=64=0.0%
      issued rwts: total=0,2274350,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   WRITE: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s), 
io=8884MiB (9316MB), run=60001-60001msec


I have no idea why set nice will lead to a poor performance for fio... 
But it seems OK that cond_resched every time since it won't result 
performance descent. So, drop this patch and just take '[PATCH] 
dm-crypt: add cond_resched() to dmcrypt_write()' you send?


>>
>> Mikulas
>>
>>
>>
>> From: Mikulas Patocka <mpatocka at redhat.com>
>>
>> It was reported that dm-crypt performs badly when the system is 
>> loaded[1].
>> So I'm adding an option "high_priority" that sets the workqueues and the
>> write_thread to nice level -20. This used to be the default in the past,
>> but it caused audio skipping, so it had to be reverted - see the commit
>> f612b2132db529feac4f965f28a1b9258ea7c22b. This commit makes it optional,
>> so that normal users won't be harmed by it.
>>
>> [1] 
>> https://listman.redhat.com/archives/dm-devel/2023-February/053410.html
>>
>> Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
>>
>> ---
>>   Documentation/admin-guide/device-mapper/dm-crypt.rst |    5 +++
>>   drivers/md/dm-crypt.c                                |   28 
>> +++++++++++++------
>>   2 files changed, 25 insertions(+), 8 deletions(-)
>>
>> Index: linux-2.6/drivers/md/dm-crypt.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/md/dm-crypt.c
>> +++ linux-2.6/drivers/md/dm-crypt.c
>> @@ -133,9 +133,9 @@ struct iv_elephant_private {
>>    * and encrypts / decrypts at the same time.
>>    */
>>   enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
>> -         DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD,
>> -         DM_CRYPT_NO_READ_WORKQUEUE, DM_CRYPT_NO_WRITE_WORKQUEUE,
>> -         DM_CRYPT_WRITE_INLINE };
>> +         DM_CRYPT_SAME_CPU, DM_CRYPT_HIGH_PRIORITY,
>> +         DM_CRYPT_NO_OFFLOAD, DM_CRYPT_NO_READ_WORKQUEUE,
>> +         DM_CRYPT_NO_WRITE_WORKQUEUE, DM_CRYPT_WRITE_INLINE };
>>   enum cipher_flags {
>>       CRYPT_MODE_INTEGRITY_AEAD,    /* Use authenticated mode for 
>> cipher */
>> @@ -3087,7 +3087,7 @@ static int crypt_ctr_optional(struct dm_
>>       struct crypt_config *cc = ti->private;
>>       struct dm_arg_set as;
>>       static const struct dm_arg _args[] = {
>> -        {0, 8, "Invalid number of feature args"},
>> +        {0, 9, "Invalid number of feature args"},
>>       };
>>       unsigned int opt_params, val;
>>       const char *opt_string, *sval;
>> @@ -3114,6 +3114,8 @@ static int crypt_ctr_optional(struct dm_
>>           else if (!strcasecmp(opt_string, "same_cpu_crypt"))
>>               set_bit(DM_CRYPT_SAME_CPU, &cc->flags);
>> +        else if (!strcasecmp(opt_string, "high_priority"))
>> +            set_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags);
>>           else if (!strcasecmp(opt_string, "submit_from_crypt_cpus"))
>>               set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
>> @@ -3352,18 +3354,22 @@ static int crypt_ctr(struct dm_target *t
>>       }
>>       ret = -ENOMEM;
>> -    cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM, 
>> 1, devname);
>> +    cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM |
>> +                       test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) * 
>> WQ_HIGHPRI,
>> +                       1, devname);
>>       if (!cc->io_queue) {
>>           ti->error = "Couldn't create kcryptd io queue";
>>           goto bad;
>>       }
>>       if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
>> -        cc->crypt_queue = alloc_workqueue("kcryptd/%s", 
>> WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM,
>> +        cc->crypt_queue = alloc_workqueue("kcryptd/%s", 
>> WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM |
>> +                          test_bit(DM_CRYPT_HIGH_PRIORITY, 
>> &cc->flags) * WQ_HIGHPRI,
>>                             1, devname);
>>       else
>>           cc->crypt_queue = alloc_workqueue("kcryptd/%s",
>> -                          WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | 
>> WQ_UNBOUND,
>> +                          WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | 
>> WQ_UNBOUND |
>> +                          test_bit(DM_CRYPT_HIGH_PRIORITY, 
>> &cc->flags) * WQ_HIGHPRI,
>>                             num_online_cpus(), devname);
>>       if (!cc->crypt_queue) {
>>           ti->error = "Couldn't create kcryptd queue";
>> @@ -3380,6 +3386,8 @@ static int crypt_ctr(struct dm_target *t
>>           ti->error = "Couldn't spawn write thread";
>>           goto bad;
>>       }
>> +    if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
>> +        set_user_nice(cc->write_thread, MIN_NICE);
>>       ti->num_flush_bios = 1;
>>       ti->limit_swap_bios = true;
>> @@ -3500,6 +3508,7 @@ static void crypt_status(struct dm_targe
>>           num_feature_args += !!ti->num_discard_bios;
>>           num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags);
>> +        num_feature_args += test_bit(DM_CRYPT_HIGH_PRIORITY, 
>> &cc->flags);
>>           num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
>>           num_feature_args += test_bit(DM_CRYPT_NO_READ_WORKQUEUE, 
>> &cc->flags);
>>           num_feature_args += test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, 
>> &cc->flags);
>> @@ -3513,6 +3522,8 @@ static void crypt_status(struct dm_targe
>>                   DMEMIT(" allow_discards");
>>               if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
>>                   DMEMIT(" same_cpu_crypt");
>> +            if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
>> +                DMEMIT(" high_priority");
>>               if (test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags))
>>                   DMEMIT(" submit_from_crypt_cpus");
>>               if (test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags))
>> @@ -3532,6 +3543,7 @@ static void crypt_status(struct dm_targe
>>           DMEMIT_TARGET_NAME_VERSION(ti->type);
>>           DMEMIT(",allow_discards=%c", ti->num_discard_bios ? 'y' : 'n');
>>           DMEMIT(",same_cpu_crypt=%c", test_bit(DM_CRYPT_SAME_CPU, 
>> &cc->flags) ? 'y' : 'n');
>> +        DMEMIT(",high_priority=%c", test_bit(DM_CRYPT_HIGH_PRIORITY, 
>> &cc->flags) ? 'y' : 'n');
>>           DMEMIT(",submit_from_crypt_cpus=%c", 
>> test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags) ?
>>                  'y' : 'n');
>>           DMEMIT(",no_read_workqueue=%c", 
>> test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags) ?
>> @@ -3659,7 +3671,7 @@ static void crypt_io_hints(struct dm_tar
>>   static struct target_type crypt_target = {
>>       .name   = "crypt",
>> -    .version = {1, 24, 0},
>> +    .version = {1, 25, 0},
>>       .module = THIS_MODULE,
>>       .ctr    = crypt_ctr,
>>       .dtr    = crypt_dtr,
>> Index: linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> ===================================================================
>> --- linux-2.6.orig/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> +++ linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> @@ -113,6 +113,11 @@ same_cpu_crypt
>>       The default is to use an unbound workqueue so that encryption work
>>       is automatically balanced between available CPUs.
>> +high_priority
>> +    Set dm-crypt workqueues and the writer thread to high priority. This
>> +    improves throughput and latency of dm-crypt while degrading general
>> +    responsiveness of the system.
>> +
>>   submit_from_crypt_cpus
>>       Disable offloading writes to a separate thread after encryption.
>>       There are some situations where offloading write bios from the
>>



More information about the dm-devel mailing list