[dm-devel] [git pull] device mapper changes for 5.9

Tue Aug 18 21:12:40 UTC 2020

Just to bring in some more context: the primary trigger that made us look
into it was high p99 read latency on a random read workflow on modern-ish
SATA SSD and NVME disks. That is, on average things looked fine, but some
portions of requests, which required a small chunk of data to be fetched
from the disk fast were stalled for an unreasonable amount of time.

Most modern IO intensive workflows probably have good provisions to deal
with slow writes and usually when we write the data we care more about the
average throughput, that is we have enough throughput to write all the
incoming data to disk without losing it. On the contrary, there are many
modern IO workflows which require small chunks of data to be fetched fast
(distributed key-value stores, caching systems etc), thus the
emphasis there is on latency of reads (vs throughput of writes). And this
is where we think the synchronous behaviour provides most benefit.

Additionally if one cares about latency they will not use HDDs for the
workflow and HDDs have much higher IO latency than CPU scheduling. Thus it
does not make much sense to do any benchmarks on HDDs as the HDD latency
will likely hide any improvement/degradation of the synchronous IO handling
in dm-crypt.

But, even latency wise, in our testing on larger block sizes (>2M) the
synchronous IO (read/writes) may show worse performance and without fully
understanding why? we're probably not ready yet to recommend something as a
default.

Regards,
Ignat

On Tue, Aug 18, 2020 at 9:40 PM John Dorminy <jdorminy at redhat.com> wrote:

> For what it's worth, I just ran two tests on a machine with dm-crypt
> using the cipher_null:ecb cipher. Results are mixed; not offloading IO
> submission can result in -27% to +23% change in throughput, in a
> selection of three IO patterns HDDs and SSDs.
>
> (Note that the IO submission thread also reorders IO to attempt to
> submit it in sector order, so that is an additional difference between
> the two modes -- it's not just "offload writes to another thread" vs
> "don't offload writes".) The summary (for my FIO workloads focused on
> parallelism) is that offloading is useful for high IO depth random
> writes on SSDs, and for long sequential small writes on HDDs.
> Offloading reduced throughput for immensely high IO depths on SSDs,
> where I would guess lock contention is reducing effective IO depth to
> the disk; and for low IO depths of sequential writes on HDDs, where I
> would guess (as it would for a zoned device) preserving submission order
> is better than attempting to reorder before submission.
>
> Two test regimes: randwrite on 7xSamsung SSD 850 PRO 128G, somewhat
> aged, behind a LSI MegaRAID card providing raid0. 6 processors
> (Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz); 128G RAM; and seqwrite,
> on a software raid0 (512k chunk size) of 4 HDDs on the same machine
> specs. Scheduler 'none' for both. LSI card in writethrough cache mode.
> All data in MB/s.
>
>
> depth    jobs    bs    dflt    no_wq    %chg    raw disk
> ----------------randwrite, SSD--------------
> 128    1    4k    282    282    0    285
> 256    4    4k    251    183    -27    283
> 2048    4    4k    266    283    +6    284
> 1    4    1m    433    414    -4    403
> ----------------seqwrite, HDD---------------
> 128    1    4k    87    107    +23    86
> 256    4    4k    101    90     -11    91
> 2048    4    4k    273    233    -15    249
> 1    4    1m    144    146    +1    146
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20200818/972c9446/attachment.htm>