[Vdo-devel] Trying to test thin provisioned LVM on VDO

Wed Jul 11 13:34:50 UTC 2018

On Wed, Jul 11 2018 at  6:48am -0400,
James Hogarth <james.hogarth at gmail.com> wrote:

> On 11 July 2018 at 11:26, James Hogarth <james.hogarth at gmail.com> wrote:
> > On 11 July 2018 at 10:40, Michael Sclafani <sclafani at redhat.com> wrote:
> >> Based on the error message and a quick scan of the code, it appears dm-thin
> >> disables discards because VDO's max_discard_sectors = 4KB is smaller than
> >> dm-thin's 64KB+ block size. I have no idea why it does that, but if it
> >> neither discards nor zeros out blocks it has written to VDO, that space will
> >> not be reclaimed.
> >>
> >
> > Thanks for confirming the line of thought I was following ...
> >
> > Annoyingly this makes the RHEL documentation pretty useless to follow
> > for carrying out thin provisioned volumes...
> >
> > Unfortunately I don't have a support account to hand to raise this as
> > a RHEL7.5 issue to resolve ...
> >
> > Looking at the lvcreate man page it's not possible to set a block size
> > for a thin pool below 64K
> >
> > -c|--chunksize Size[k|UNIT]
> >               The size of chunks in a snapshot, cache pool or thin
> > pool.  For snapshots, the value
> >               must be a power of 2 between 4KiB and 512KiB and the
> > default value is 4.  For a cache
> >               pool the value must be between 32KiB and 1GiB and the
> > default value is 64.  For a thin
> >               pool the value must be between 64KiB and 1GiB and the
> > default value starts with 64 and
> >               scales up to fit the pool metadata size within 128MiB,
> > if the pool metadata size is not
> >               specified.  The value must be a multiple of 64KiB.  See
> > lvmthin(7) and lvmcache(7) for
> >               more information.
> >
> > What's going to be the best approach to resolve this so that thin
> > provisioning works as expected? It's obviously not advisable to use in
> > this configuration due to the inevitable disk exhaustion issue that
> > will arise.
> 
> 
> Mike you wrote the relevant patch that appears to be causing the
> conflict and prevents dm-thin passing the discard to VDO here:
> 
> https://www.redhat.com/archives/dm-devel/2012-August/msg00381.html
> 
> I know it was a while back but do you recall what the reason for the
> max_discard_sector and sectors_per_block comparison was for?

DM thinp cannot make use of a discard that only cover part of a dm-thinp
block.  SO its internal accounting wouldn't work.

Now in the VDO case, you still _really_ want the discard (that DM thinp
cannot use, and as such will not reclaim and reuse the associated block)
to get passed down -- so VDO can recover space, etc.

> From the VDO code it appears untenable to increase maxDiscardSector
> without major performance impact - to the extent of I/O stalls.

That needs to be explored further.  Only allowing 4K discards is also a
serious source of performance loss (by forcing the block core's
bldev_issue_discard to iterate on such a small granularity).

Pretty sure Zdenek found that VDO's discard performance was _very_
slow.

> So it looks like the only way to make this work is a change to dm-thin
> to ensure the discards are still passed to the VDO layer below it.

Not opposed to adding that.  Think it'll require a new feature though,
e.g. "discard_passdown".  We already have "no_discard_passdown" -- which
is safe, whereas "discard_passdown" could be unsafe (if device simply
doesn't support diacrds at all).. so the constraint for the
"discard_passdown" override must be that the pool's underlying data
device does actually support discard.

But all said, discard passdown happens as a side-effect at the end of
dm-thinp's discard processing (that is all done in terms of
dm-bio-prison locking that occurs at a thinp blocksize granularity).  As
such it could become quite complex to update dm-thinp's discard
code-path to process discards that don't cover an entire thinp block.
Might not be awful, but just letting you know as an upfront disclaimer.

Another option might be to see what shit hits the fan if we were to
relax the DM thinp blocksize all the way down to 4K.  It'll definitely
put pressure on the thinp metadata, etc.  Could result in serious
performance hit, and more side-effects I cannot devine at the moment.
But it is a "cheap" way forward.. but in general we'd probably want to
gate the use of such a small a blocksize on some sort of
i-know-what-i'm-doing feature.

Mike