[dm-devel] [PATCH] dm-crypt: add the "high_priority" flag
yangerkun
yangerkun at huawei.com
Sat Mar 11 09:26:48 UTC 2023
在 2023/3/7 9:30, yangerkun 写道:
>
>
> 在 2023/3/7 3:01, Mikulas Patocka 写道:
>>
>>
>> On Mon, 6 Mar 2023, Mikulas Patocka wrote:
>>
>>>> Hi,
>>>>
>>>> Thanks a lot for your review!
>>>>
>>>> It's ok to fix the softlockup, but for async write encrypt,
>>>> kcryptd_crypt_write_io_submit will add bio to write_tree, and once we
>>>> call cond_resched before every kcryptd_io_write, the write performance
>>>> may be poor while we meet a high cpu usage scene.
>>>
>>> Hi
>>>
>>> To fix this problem, find the PID of the process "dmcrypt_write" and
>>> change its priority to -20, for example "renice -n -20 -p 34748".
>>>
>>> This is the proper way how to fix it; locking up the process for one
>>> second is not.
>>>
>>> We used to have high-priority workqueues by default, but it caused audio
>>> playback skipping, so we had to revert it - see
>>> f612b2132db529feac4f965f28a1b9258ea7c22b.
>>>
>>> Perhaps we should add an option to have high-priority kernel threads?
>>>
>>> Mikulas
>>
>> Here I'm sending a patch that makes it optional to raise the priority
>> of dm-crypt workqueues and the write thread. Test it, if it helps for
>> you.
>
> Thanks, will test it!
Sorry for the delay, the scene that report the softlockup cannot been
using(arm64 server), instead, we use a x86_64 server do the test.
Test step:
1. make every cpu busy, stress-ng -c 72 --timeout=360000
2. choose pmem as the disk, cryptsetup -c sm4-xts-plain64 --key-size=256
--hash sm3 luksFormat --sector-size 4096 --pbkdf=pbkdf2 /dev/pmem0
3. taskset -c 11 fio -filename=/dev/mapper/test -direct=1 -iodepth 64
-rw=randwrite -ioengine=libaio -bs=4k -runtime=60 -group_reporting
-name=sqe_100write_4k -thread -numjobs=1
Results:
origin(without any patch):
[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test
-direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k -runtime=60
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=189MiB/s][w=48.3k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40205: Sat Mar 11
17:51:26 2023
write: IOPS=60.3k, BW=236MiB/s (247MB/s)(13.8GiB/60001msec); 0 zone
resets
slat (usec): min=2, max=14393, avg= 4.78, stdev= 9.83
clat (usec): min=26, max=44200, avg=1054.97, stdev=705.63
lat (usec): min=31, max=44204, avg=1059.84, stdev=706.58
clat percentiles (usec):
| 1.00th=[ 188], 5.00th=[ 359], 10.00th=[ 404], 20.00th=[ 469],
| 30.00th=[ 611], 40.00th=[ 717], 50.00th=[ 914], 60.00th=[ 1172],
| 70.00th=[ 1303], 80.00th=[ 1467], 90.00th=[ 1827], 95.00th=[ 2180],
| 99.00th=[ 3294], 99.50th=[ 4146], 99.90th=[ 5800], 99.95th=[ 8225],
| 99.99th=[14091]
bw ( KiB/s): min=161560, max=486960, per=100.00%, avg=241961.65,
stdev=65188.84, samples=119
iops : min=40390, max=121740, avg=60490.40, stdev=16297.21,
samples=119
lat (usec) : 50=0.03%, 100=0.33%, 250=1.22%, 500=21.77%, 750=19.12%
lat (usec) : 1000=10.40%
lat (msec) : 2=40.23%, 4=6.34%, 10=0.53%, 20=0.03%, 50=0.01%
cpu : usr=16.16%, sys=29.07%, ctx=1459171, majf=0, minf=32412
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
issued rwts: total=0,3620430,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=236MiB/s (247MB/s), 236MiB/s-236MiB/s (247MB/s-247MB/s),
io=13.8GiB (14.8GB), run=60001-60001msec
origin + schedule everytime:
[root at fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..17ddca293965 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
io = crypt_io_from_node(rb_first(&write_tree));
rb_erase(&io->rb_node, &write_tree);
kcryptd_io_write(io);
+ cond_resched();
} while (!RB_EMPTY_ROOT(&write_tree));
blk_finish_plug(&plug);
}
[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test
-direct=1 -iodepth 64 -rw=randwrite -ioengine=libaio -bs=4k -runtime=60
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W)
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [w(1)][100.0%][w=197MiB/s][w=50.4k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40287: Sat Mar 11
17:56:25 2023
write: IOPS=59.9k, BW=234MiB/s (245MB/s)(13.7GiB/60001msec); 0 zone
resets
slat (usec): min=2, max=13251, avg= 5.19, stdev= 9.59
clat (usec): min=14, max=41490, avg=1062.91, stdev=655.31
lat (usec): min=26, max=41495, avg=1068.20, stdev=656.78
clat percentiles (usec):
| 1.00th=[ 269], 5.00th=[ 363], 10.00th=[ 412], 20.00th=[ 494],
| 30.00th=[ 644], 40.00th=[ 766], 50.00th=[ 979], 60.00th=[ 1172],
| 70.00th=[ 1303], 80.00th=[ 1450], 90.00th=[ 1827], 95.00th=[ 2147],
| 99.00th=[ 3097], 99.50th=[ 3785], 99.90th=[ 5145], 99.95th=[ 5735],
| 99.99th=[13698]
bw ( KiB/s): min=171976, max=562936, per=100.00%, avg=239748.22,
stdev=61262.50, samples=119
iops : min=42994, max=140734, avg=59937.04, stdev=15315.64,
samples=119
lat (usec) : 20=0.01%, 50=0.01%, 100=0.10%, 250=0.67%, 500=19.60%
lat (usec) : 750=18.69%, 1000=11.81%
lat (msec) : 2=42.65%, 4=6.05%, 10=0.40%, 20=0.02%, 50=0.01%
cpu : usr=15.94%, sys=28.99%, ctx=1564116, majf=0, minf=32437
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
issued rwts: total=0,3592061,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=234MiB/s (245MB/s), 234MiB/s-234MiB/s (245MB/s-245MB/s),
io=13.7GiB (14.7GB), run=60001-60001msec
origin + schedule everytime + set nice:
[root at fedora hulk-5.10]# git diff
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 3d975db86434..f2135c223944 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1934,6 +1934,7 @@ static int dmcrypt_write(void *data)
io = crypt_io_from_node(rb_first(&write_tree));
rb_erase(&io->rb_node, &write_tree);
kcryptd_io_write(io);
+ cond_resched();
} while (!RB_EMPTY_ROOT(&write_tree));
blk_finish_plug(&plug);
}
@@ -3298,18 +3299,18 @@ static int crypt_ctr(struct dm_target *ti,
unsigned int argc, char **argv)
}
ret = -ENOMEM;
- cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM,
1, devname);
+ cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM |
WQ_HIGHPRI, 1, devname);
if (!cc->io_queue) {
ti->error = "Couldn't create kcryptd io queue";
goto bad;
}
if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
- cc->crypt_queue = alloc_workqueue("kcryptd/%s",
WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM,
+ cc->crypt_queue = alloc_workqueue("kcryptd/%s",
WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM | WQ_HIGHPRI,
1, devname);
else
cc->crypt_queue = alloc_workqueue("kcryptd/%s",
- WQ_CPU_INTENSIVE |
WQ_MEM_RECLAIM | WQ_UNBOUND,
+ WQ_CPU_INTENSIVE |
WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_HIGHPRI,
num_online_cpus(),
devname);
if (!cc->crypt_queue) {
ti->error = "Couldn't create kcryptd queue";
@@ -3326,6 +3327,7 @@ static int crypt_ctr(struct dm_target *ti,
unsigned int argc, char **argv)
ti->error = "Couldn't spawn write thread";
goto bad;
}
+ set_user_nice(cc->write_thread, MIN_NICE);
wake_up_process(cc->write_thread);
ti->num_flush_bios = 1;
[root at fedora hulk-5.10]# taskset -c 11 fio -filename=/dev/mapper/test
-direct=1 -iodepth 64 -rw=write -ioengine=libaio -bs=4k -runtime=60
-group_reporting -name=sqe_100write_4k -thread -numjobs=1
sqe_100write_4k: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B,
(T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.29
Starting 1 thread
Jobs: 1 (f=1): [W(1)][100.0%][w=132MiB/s][w=33.7k IOPS][eta 00m:00s]
sqe_100write_4k: (groupid=0, jobs=1): err= 0: pid=40483: Sat Mar 11
18:02:21 2023
write: IOPS=37.9k, BW=148MiB/s (155MB/s)(8884MiB/60001msec); 0 zone
resets
slat (usec): min=2, max=3105, avg=22.68, stdev=11.80
clat (usec): min=18, max=46657, avg=1664.96, stdev=604.02
lat (usec): min=39, max=46660, avg=1687.78, stdev=613.12
clat percentiles (usec):
| 1.00th=[ 383], 5.00th=[ 482], 10.00th=[ 594], 20.00th=[ 914],
| 30.00th=[ 1696], 40.00th=[ 1860], 50.00th=[ 1909], 60.00th=[ 1958],
| 70.00th=[ 1991], 80.00th=[ 2073], 90.00th=[ 2180], 95.00th=[ 2212],
| 99.00th=[ 2376], 99.50th=[ 2442], 99.90th=[ 4359], 99.95th=[ 4883],
| 99.99th=[12387]
bw ( KiB/s): min=115368, max=363976, per=100.00%, avg=151823.13,
stdev=44148.16, samples=119
iops : min=28842, max=90994, avg=37955.78, stdev=11037.04,
samples=119
lat (usec) : 20=0.01%, 50=0.01%, 100=0.06%, 250=0.23%, 500=7.38%
lat (usec) : 750=4.65%, 1000=9.11%
lat (msec) : 2=49.16%, 4=29.29%, 10=0.10%, 20=0.01%, 50=0.01%
cpu : usr=8.86%, sys=18.57%, ctx=1878960, majf=0, minf=401
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%,
>=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%,
>=64=0.0%
issued rwts: total=0,2274350,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=148MiB/s (155MB/s), 148MiB/s-148MiB/s (155MB/s-155MB/s),
io=8884MiB (9316MB), run=60001-60001msec
I have no idea why set nice will lead to a poor performance for fio...
But it seems OK that cond_resched every time since it won't result
performance descent. So, drop this patch and just take '[PATCH]
dm-crypt: add cond_resched() to dmcrypt_write()' you send?
>>
>> Mikulas
>>
>>
>>
>> From: Mikulas Patocka <mpatocka at redhat.com>
>>
>> It was reported that dm-crypt performs badly when the system is
>> loaded[1].
>> So I'm adding an option "high_priority" that sets the workqueues and the
>> write_thread to nice level -20. This used to be the default in the past,
>> but it caused audio skipping, so it had to be reverted - see the commit
>> f612b2132db529feac4f965f28a1b9258ea7c22b. This commit makes it optional,
>> so that normal users won't be harmed by it.
>>
>> [1]
>> https://listman.redhat.com/archives/dm-devel/2023-February/053410.html
>>
>> Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
>>
>> ---
>> Documentation/admin-guide/device-mapper/dm-crypt.rst | 5 +++
>> drivers/md/dm-crypt.c | 28
>> +++++++++++++------
>> 2 files changed, 25 insertions(+), 8 deletions(-)
>>
>> Index: linux-2.6/drivers/md/dm-crypt.c
>> ===================================================================
>> --- linux-2.6.orig/drivers/md/dm-crypt.c
>> +++ linux-2.6/drivers/md/dm-crypt.c
>> @@ -133,9 +133,9 @@ struct iv_elephant_private {
>> * and encrypts / decrypts at the same time.
>> */
>> enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID,
>> - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD,
>> - DM_CRYPT_NO_READ_WORKQUEUE, DM_CRYPT_NO_WRITE_WORKQUEUE,
>> - DM_CRYPT_WRITE_INLINE };
>> + DM_CRYPT_SAME_CPU, DM_CRYPT_HIGH_PRIORITY,
>> + DM_CRYPT_NO_OFFLOAD, DM_CRYPT_NO_READ_WORKQUEUE,
>> + DM_CRYPT_NO_WRITE_WORKQUEUE, DM_CRYPT_WRITE_INLINE };
>> enum cipher_flags {
>> CRYPT_MODE_INTEGRITY_AEAD, /* Use authenticated mode for
>> cipher */
>> @@ -3087,7 +3087,7 @@ static int crypt_ctr_optional(struct dm_
>> struct crypt_config *cc = ti->private;
>> struct dm_arg_set as;
>> static const struct dm_arg _args[] = {
>> - {0, 8, "Invalid number of feature args"},
>> + {0, 9, "Invalid number of feature args"},
>> };
>> unsigned int opt_params, val;
>> const char *opt_string, *sval;
>> @@ -3114,6 +3114,8 @@ static int crypt_ctr_optional(struct dm_
>> else if (!strcasecmp(opt_string, "same_cpu_crypt"))
>> set_bit(DM_CRYPT_SAME_CPU, &cc->flags);
>> + else if (!strcasecmp(opt_string, "high_priority"))
>> + set_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags);
>> else if (!strcasecmp(opt_string, "submit_from_crypt_cpus"))
>> set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
>> @@ -3352,18 +3354,22 @@ static int crypt_ctr(struct dm_target *t
>> }
>> ret = -ENOMEM;
>> - cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM,
>> 1, devname);
>> + cc->io_queue = alloc_workqueue("kcryptd_io/%s", WQ_MEM_RECLAIM |
>> + test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags) *
>> WQ_HIGHPRI,
>> + 1, devname);
>> if (!cc->io_queue) {
>> ti->error = "Couldn't create kcryptd io queue";
>> goto bad;
>> }
>> if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
>> - cc->crypt_queue = alloc_workqueue("kcryptd/%s",
>> WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM,
>> + cc->crypt_queue = alloc_workqueue("kcryptd/%s",
>> WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM |
>> + test_bit(DM_CRYPT_HIGH_PRIORITY,
>> &cc->flags) * WQ_HIGHPRI,
>> 1, devname);
>> else
>> cc->crypt_queue = alloc_workqueue("kcryptd/%s",
>> - WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM |
>> WQ_UNBOUND,
>> + WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM |
>> WQ_UNBOUND |
>> + test_bit(DM_CRYPT_HIGH_PRIORITY,
>> &cc->flags) * WQ_HIGHPRI,
>> num_online_cpus(), devname);
>> if (!cc->crypt_queue) {
>> ti->error = "Couldn't create kcryptd queue";
>> @@ -3380,6 +3386,8 @@ static int crypt_ctr(struct dm_target *t
>> ti->error = "Couldn't spawn write thread";
>> goto bad;
>> }
>> + if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
>> + set_user_nice(cc->write_thread, MIN_NICE);
>> ti->num_flush_bios = 1;
>> ti->limit_swap_bios = true;
>> @@ -3500,6 +3508,7 @@ static void crypt_status(struct dm_targe
>> num_feature_args += !!ti->num_discard_bios;
>> num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags);
>> + num_feature_args += test_bit(DM_CRYPT_HIGH_PRIORITY,
>> &cc->flags);
>> num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags);
>> num_feature_args += test_bit(DM_CRYPT_NO_READ_WORKQUEUE,
>> &cc->flags);
>> num_feature_args += test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE,
>> &cc->flags);
>> @@ -3513,6 +3522,8 @@ static void crypt_status(struct dm_targe
>> DMEMIT(" allow_discards");
>> if (test_bit(DM_CRYPT_SAME_CPU, &cc->flags))
>> DMEMIT(" same_cpu_crypt");
>> + if (test_bit(DM_CRYPT_HIGH_PRIORITY, &cc->flags))
>> + DMEMIT(" high_priority");
>> if (test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags))
>> DMEMIT(" submit_from_crypt_cpus");
>> if (test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags))
>> @@ -3532,6 +3543,7 @@ static void crypt_status(struct dm_targe
>> DMEMIT_TARGET_NAME_VERSION(ti->type);
>> DMEMIT(",allow_discards=%c", ti->num_discard_bios ? 'y' : 'n');
>> DMEMIT(",same_cpu_crypt=%c", test_bit(DM_CRYPT_SAME_CPU,
>> &cc->flags) ? 'y' : 'n');
>> + DMEMIT(",high_priority=%c", test_bit(DM_CRYPT_HIGH_PRIORITY,
>> &cc->flags) ? 'y' : 'n');
>> DMEMIT(",submit_from_crypt_cpus=%c",
>> test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags) ?
>> 'y' : 'n');
>> DMEMIT(",no_read_workqueue=%c",
>> test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags) ?
>> @@ -3659,7 +3671,7 @@ static void crypt_io_hints(struct dm_tar
>> static struct target_type crypt_target = {
>> .name = "crypt",
>> - .version = {1, 24, 0},
>> + .version = {1, 25, 0},
>> .module = THIS_MODULE,
>> .ctr = crypt_ctr,
>> .dtr = crypt_dtr,
>> Index: linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> ===================================================================
>> --- linux-2.6.orig/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> +++ linux-2.6/Documentation/admin-guide/device-mapper/dm-crypt.rst
>> @@ -113,6 +113,11 @@ same_cpu_crypt
>> The default is to use an unbound workqueue so that encryption work
>> is automatically balanced between available CPUs.
>> +high_priority
>> + Set dm-crypt workqueues and the writer thread to high priority. This
>> + improves throughput and latency of dm-crypt while degrading general
>> + responsiveness of the system.
>> +
>> submit_from_crypt_cpus
>> Disable offloading writes to a separate thread after encryption.
>> There are some situations where offloading write bios from the
>>
More information about the dm-devel
mailing list