[dm-devel] Integrity discard/trim extremely slow on NVMe SSD storage (~10GiB/minute)
Mikulas Patocka
mpatocka at redhat.com
Mon Apr 26 15:33:32 UTC 2021
On Mon, 19 Apr 2021, Melvin Vermeeren wrote:
> Note: This was originally posted on cryptsetup GitLab.
> Note: Reposting here for better visibility as it appears to be a kernel bug.
> Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639
>
> Issue description
> -----------------
>
> With a Seagate FireCuda 520 2TB NVMe SSD running in PCIe 3.0 x4 mode (my
> motherboard does not have PCIe 4.0), discards through `dm-integrity`
> layer are extremely slow to the point of being almost unusable or in
> some cases fully unusable.
>
> This is so slow that having the `discard` option on swap in not
> possible, as it takes around 3 minutes to complete for 32GiB swap
> causing timeouts during boot which in turn causes various other services
> to fail resulting in a drop to the emergency shell.
>
> `blkdiscard` directly to NVMe device takes I think 10 sec or so for the
> entire 2TB, but through `dm-integrity` the rate is approx 10GiB per
> minute, meaning over 3 hours to discard the entire 2TB. Normal read and
> write operations are not affected and are high performance, easily
> reaching 2GiB/s through the entire layer: `disk dm-integrity mdadm luks
> lvm ext4`.
>
> Checking the kernel thread usage in htop quite some
> `dm-integrity-offload` threads are in the `D` state with `0.0` CPU usage
> when discarding, which is rather odd. No integrity threads are actually
> working and read-write disk usage measured with `dstat` is not even
> 1MiB/s.
>
> To detail the above, `dstat` shows extremely clear timings: 2 seconds 0k
> write, 1 second 512k write, repeat. Possible timeout in locks somewhere
> or other problematic lock situation?
>
> Steps for reproducing the issue
> -------------------------------
>
> 1. Create two 10G partitions on SSD.
> 2. Setup `dm-integrity` on one of these and open the device with `--allow-
> discards`.
> 3. `blkdiscard` both partitions.
> * Raw partition is done instantly.
> * Integrity partition takes around a minute.
>
> Additional info
> ---------------
>
> The NVMe device is formatted to native 4096 byte sectors and the `dm-
> integrity` layer also uses 4096 byte sectors.
>
> Debian bullseye (testing), kernel 5.10.0-6-rt-amd64 5.10.28-1. Same issue
> occurred during testing with Arch Linux liveiso which is kernel 5.11.x.
> Cryptsetup package version 2.3.5.
>
> On another server system (IBM POWER9, ppc64le) with SAS 3.0 SSD discard is
> working properly at more than acceptable speeds, showing significant CPU usage
> while discarding. In this case it is a regular Intel amd64 desktop system.
>
> Debug log
> ---------
>
> Nothing really fails, dmesg and syslog show no issues/warnings at all, not
> sure what to include.
>
> Only appears to effect NVMe
> ---------------------------
>
> Further tests on the same machine show that SATA SSD is not affected by this
> issue and discards at high performance. Appears to be NVMe-specific bug:
> Ref: https://gitlab.com/cryptsetup/cryptsetup/-/issues/639#note_555208783
I tried it on my nvme device (Samsung SSD 960 EVO 500GB) and I could
discard 32GB in 5 seconds.
I assume that it is specific to the nvme device you are using. The device
is perhaps slow due to a mix of dicard+read+write requests that
dm-integrity generates.
> If there is anything I can do to help feel free to let me know.
> Note that I am not subscribed to dm-level, please CC me directly.
>
> Thanks,
Could you try it on other nvme disks?
Mikulas
More information about the dm-devel
mailing list