[dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
Melvin Vermeeren
vermeeren at vermwa.re
Mon Apr 19 18:29:17 UTC 2021
Note: This was originally posted on cryptsetup GitLab.
Note: Reposting here for better visibility as it appears to be a kernel bug.
Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639
Issue description
-----------------
With a Seagate FireCuda 520 2TB NVMe SSD running in PCIe 3.0 x4 mode (my
motherboard does not have PCIe 4.0), discards through `dm-integrity` layer are
extremely slow to the point of being almost unusable or in some cases fully
unusable.
This is so slow that having the `discard` option on swap in not possible, as
it takes around 3 minutes to complete for 32GiB swap causing timeouts during
boot which in turn causes various other services to fail resulting in a drop
to the emergency shell.
`blkdiscard` directly to NVMe device takes I think 10 sec or so for the entire
2TB, but through `dm-integrity` the rate is approx 10GiB per minute, meaning
over 3 hours to discard the entire 2TB. Normal read and write operations are
not affected and are high performance, easily reaching 2GiB/s through the
entire layer: `disk dm-integrity mdadm luks lvm ext4`.
Checking the kernel thread usage in htop quite some `dm-integrity-offload`
threads are in the `D` state with `0.0` CPU usage when discarding, which is
rather odd. No integrity threads are actually working and read-write disk
usage measured with `dstat` is not even 1MiB/s.
To detail the above, `dstat` shows extremely clear timings: 2 seconds 0k
write, 1 second 512k write, repeat. Possible timeout in locks somewhere or
other problematic lock situation?
Steps for reproducing the issue
-------------------------------
1. Create two 10G partitions on SSD.
2. Setup `dm-integrity` on one of these and open the device with `--allow-
discards`.
3. `blkdiscard` both partitions.
* Raw partition is done instantly.
* Integrity partition takes around a minute.
Additional info
---------------
The NVMe device is formatted to native 4096 byte sectors and the `dm-
integrity` layer also uses 4096 byte sectors.
Debian bullseye (testing), kernel 5.10.0-6-rt-amd64 5.10.28-1. Same issue
occurred during testing with Arch Linux liveiso which is kernel 5.11.x.
Cryptsetup package version 2.3.5.
On another server system (IBM POWER9, ppc64le) with SAS 3.0 SSD discard is
working properly at more than acceptable speeds, showing significant CPU usage
while discarding. In this case it is a regular Intel amd64 desktop system.
Debug log
---------
Nothing really fails, dmesg and syslog show no issues/warnings at all, not
sure what to include.
Only appears to effect NVMe
---------------------------
Further tests on the same machine show that SATA SSD is not affected by this
issue and discards at high performance. Appears to be NVMe-specific bug:
Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639#note_555208783
If there is anything I can do to help feel free to let me know.
Note that I am not subscribed to dm-level, please CC me directly.
Thanks,
--
Melvin Vermeeren
Systems engineer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20210419/5145f178/attachment.sig>
More information about the dm-devel
mailing list