[dm-devel] [lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
Eric Wheeler
lvm-dev at lists.ewheeler.net
Tue Dec 9 08:02:12 UTC 2014
On Fri, 5 Dec 2014, Mike Snitzer wrote:
> I do wonder what the performance impact is on this for dm. Have you
> tried a (worst case) test of writing blocks that are zero filled,
Jens, thank you for your help w/ fio for generating zeroed writes!
Clearly fio is superior to dd as a sequential benchmarking tool; I was
actually able to push on the system's memory bandwidth.
Results:
I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled()
and then complete without writing to disk, regardless of the return value
from bio_is_zero_filled(). In loop.c this was done in
do_bio_filebacked(), and for dm-thin.c this was done within
provision_block().
This allows us to compare the performance difference between the simple
loopback block device driver vs the more complex dm-thinp implementation
just prior to block allocation. These benchmarks give us a sense of how
performance differences relate between bio_is_zero_filled() and block
device implementation complexity, in addition to the raw performance of
bio_is_zero_filled in best- and worst-case scenarios.
Since we always complete without writing after the call to
bio_is_zero_filled, regardless of the bio's content (all zeros or not), we
can benchmark the difference in the common use case of random data, as
well as the edge case of skipping writes for bio's that contain all zeros
when writing to unallocated space of thin-provisioned volumes.
These benchmarks were performed under KVM, so expect them to be lower
bounds due to overhead. The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2
@ 3.30GHz. The VM was allocated 4GB of memory with 4 cpu cores.
Benchmarks were performed using fio-2.1.14-33-gf8b8f
--name=writebw
--rw=write
--time_based
--runtime=7 --ramp_time=3
--norandommap
--ioengine=libaio
--group_reporting
--direct=1
--bs=1m
--filename=/dev/X
--numjobs=Y
Random data was tested using:
--zero_buffers=0 --scramble_buffers=1
Zeroed data was tested using:
--zero_buffers=1 --scramble_buffers=0
Values below are from aggrb.
dm-thinp (MB/s) loopback (MB/s) loop faster by factor of
==============+======================================================
random jobs=4 | 18496.0 33522.0 1.68x
zeros jobs=4 | 8119.2 9767.2 1.20x
==============+======================================================
random jobs=1 | 7330.5 12330.0 1.81x
zeros jobs=1 | 4965.2 6799.9 1.11x
We can see that fio reports a best-case performance of 33.5GB/s with
random data using 4 jobs in this test environment within loop.c.
For the real-world best-case within dm-thinp, fio reports 18.4GB/s, which
is is relevant for use cases where bio vectors tend to contain non-zero
data, particularly toward the beginning of the vector set.
I expect that the performance difference between loop.c and dm-thinp is
due to implementation complexity of the block device driver, such as
checking the metadata to see if a block must be allocated before calling
provision_block().
(Note that it may be possible for these test values to exceed the memory
bandwidth of the system since we exit early if finding non-zero data in a
biovec, thus the remaining data is not actually inspected but is counted
by fio. Worst-case values should all be below the memory bandwidth
maximum since all data is inspected. I believe memtest86+ says my memory
bandwidth is ~17GB/s.)
--
Eric Wheeler, President eWheeler, Inc. dba Global Linux Security
888-LINUX26 (888-546-8926) Fax: 503-716-3878 PO Box 25107
www.GlobalLinuxSecurity.pro Linux since 1996! Portland, OR 97298
More information about the dm-devel
mailing list