[lvm-devel] dm thin: optimize away writing all zeroes to unprovisioned blocks
Jens Axboe
axboe at kernel.dk
Tue Dec 9 15:31:30 UTC 2014
On 12/09/2014 01:02 AM, Eric Wheeler wrote:
> On Fri, 5 Dec 2014, Mike Snitzer wrote:
>> I do wonder what the performance impact is on this for dm. Have you
>> tried a (worst case) test of writing blocks that are zero filled,
>
> Jens, thank you for your help w/ fio for generating zeroed writes!
> Clearly fio is superior to dd as a sequential benchmarking tool; I was
> actually able to push on the system's memory bandwidth.
>
> Results:
>
> I hacked block/loop.c and md/dm-thin.c to always call bio_is_zero_filled()
> and then complete without writing to disk, regardless of the return value
> from bio_is_zero_filled(). In loop.c this was done in
> do_bio_filebacked(), and for dm-thin.c this was done within
> provision_block().
>
> This allows us to compare the performance difference between the simple
> loopback block device driver vs the more complex dm-thinp implementation
> just prior to block allocation. These benchmarks give us a sense of how
> performance differences relate between bio_is_zero_filled() and block
> device implementation complexity, in addition to the raw performance of
> bio_is_zero_filled in best- and worst-case scenarios.
>
> Since we always complete without writing after the call to
> bio_is_zero_filled, regardless of the bio's content (all zeros or not), we
> can benchmark the difference in the common use case of random data, as
> well as the edge case of skipping writes for bio's that contain all zeros
> when writing to unallocated space of thin-provisioned volumes.
>
> These benchmarks were performed under KVM, so expect them to be lower
> bounds due to overhead. The hardware is a Intel(R) Xeon(R) CPU E3-1230 V2
> @ 3.30GHz. The VM was allocated 4GB of memory with 4 cpu cores.
>
> Benchmarks were performed using fio-2.1.14-33-gf8b8f
> --name=writebw
> --rw=write
> --time_based
> --runtime=7 --ramp_time=3
> --norandommap
> --ioengine=libaio
> --group_reporting
> --direct=1
> --bs=1m
> --filename=/dev/X
> --numjobs=Y
>
> Random data was tested using:
> --zero_buffers=0 --scramble_buffers=1
>
> Zeroed data was tested using:
> --zero_buffers=1 --scramble_buffers=0
>
> Values below are from aggrb.
>
> dm-thinp (MB/s) loopback (MB/s) loop faster by factor of
> ==============+======================================================
> random jobs=4 | 18496.0 33522.0 1.68x
> zeros jobs=4 | 8119.2 9767.2 1.20x
> ==============+======================================================
> random jobs=1 | 7330.5 12330.0 1.81x
> zeros jobs=1 | 4965.2 6799.9 1.11x
This looks more reasonable in terms of throughput.
One major worry here is that checking every write is blowing your cache,
so you could have a major impact on performance in general. Even for
O_DIRECT writes, you are now accessing the memory. Have you looked into
doing non-temporal memory compares instead? I think that would be the
way to go.
--
Jens Axboe
More information about the lvm-devel
mailing list