[dm-devel] max_sectors_kb limitations with VDO and dm-thin

Tue Apr 23 17:02:38 UTC 2019

I have added vdo-devel to the converation:
https://www.redhat.com/archives/vdo-devel/2019-April/msg00017.html

Here is some more info to describe the specific issue:

A dm-thin volume is configured with a chunk/block size that determines the
minimum allocation size that it can track, between 64KiB and 1GiB. If an
application performs a write to a dm-thin block device, and that IO
operation completely overlaps a thin block, dm-thin will skip zeroing after
allocation before performing the write. This is a pretty big performance
optimization as it effectively halves IO for large sequential writes. When
a block device has a snapshot the data is referenced by both the original
block and the snapshot. If a write is issued dm-thin will  normally
allocate a new chunk, copy the old data to that new chunk, then perform the
write. If the new write completely overlaps a chunk it will skip the copy.

So for example dm-thin block device is created in a thin pool with a 512k
block size. A new block is created and an application performs a 4k
sequential write at the beginning of the volume. dm-thin will do the
following,

1) allocate 512k block
2) write 0's to the block
3) perform the 4k write

This does 516k of writes for a 4k write (ouch). If the write was at least
512k, it will skip zeroing and just do the write.

Similarly assume there is a dm-thin block device with a snapshot and data
is shared between the two. Again the application performs a 4k write.

1) allocate new 512k block
2) copy 512k form the old block to the new
3) perform the 4k write

This does 512k in reads and 516k in writes (big ouch). If the write was at
least 512k it will skip all the overhead.

Now fast forward to VDO. Normally the IO size is determined by the
max_sectors_kb setting in /sys/block/DEVICE/queue. This value is inherited
for stacked DM devices and can be modified by the user up to the hardware
limit max_hw_sectors_kb, which also appears to be inherited for stacked DM
devices. VDO sets this value to 4k which in turn forces all layers stacked
above it to also have a 4k maximum. If you take my previous example but
place VDO beneath the dm-thin volume, all IO sequential or otherwise will
be split down to 4k which will completely eliminate all the performance
optimizations that dm-thin provides.

1) Is this known behavior?
2) Is there a possible workaround?

On Tue, Apr 23, 2019 at 6:11 AM Zdenek Kabelac <zkabelac at redhat.com> wrote:

> Dne 19. 04. 19 v 16:40 Ryan Norwood napsal(a):
> > We have been using dm-thin layered above VDO and have noticed that our
> > performance is not optimal for large sequential writes as max_sectors_kb
> > and max_hw_sectors_kb for all thin devices are set to 4k due to the VDO
> layer
> > beneath.
> >
> > This effectively eliminates the performance optimizations for sequential
> > writes to skip both zeroing and COW overhead when a write fully overlaps
> a
> > thin chunk as all bios are split into 4k which always be less than the
> 64k
> > thin chunk minimum.
> >
> > Is this known behavior? Is there any way around this issue?
>
> Hi
>
> If you require highest performance - I'd suggest to avoid using VDO.
> VDO replaces performance with better space utilization.
> It works on 4KiB block - so by design it's going to be slow.
>
> I'd also probably not mix 2 provisioning technologies together - there
> is nontrivial amount of problematic states when the whole device stack
> runs out of real physical space.
>
> Regards
>
> Zdenek
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20190423/0d7dc761/attachment.htm>