[dm-devel] [RFC v2 PATCH 05/10] dm thin: add methods to set and get reserved space
Brian Foster
bfoster at redhat.com
Wed Apr 13 20:41:18 UTC 2016
On Wed, Apr 13, 2016 at 02:33:52PM -0400, Brian Foster wrote:
> On Wed, Apr 13, 2016 at 10:44:42AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 12, 2016 at 12:42:48PM -0400, Brian Foster wrote:
> > > From: Joe Thornber <ejt at redhat.com>
> > >
> > > Experimental reserve interface for XFS guys to play with.
> > >
> > > I have big reservations (no pun intended) about this patch.
> > >
> > > [BF:
> > > - Support for reservation reduction.
> > > - Support for space provisioning.
> > > - Condensed to a single function.]
> > >
> > > Not-Signed-off-by: Joe Thornber <ejt at redhat.com>
> > > Not-Signed-off-by: Mike Snitzer <snitzer at redhat.com>
> > > ---
> > > drivers/md/dm-thin.c | 181 ++++++++++++++++++++++++++++++++++++++++++++++++---
> > > 1 file changed, 171 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
> > > index 92237b6..32bc5bd 100644
> > > --- a/drivers/md/dm-thin.c
> > > +++ b/drivers/md/dm-thin.c
> ...
> > > @@ -4271,6 +4343,94 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
> > > limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
> > > }
> > >
> > > +static int thin_provision_space(struct dm_target *ti, sector_t offset,
> > > + sector_t len, sector_t *res)
> > > +{
> > > + struct thin_c *tc = ti->private;
> > > + struct pool *pool = tc->pool;
> > > + sector_t end;
> > > + dm_block_t pblock;
> > > + dm_block_t vblock;
> > > + int error;
> > > + struct dm_thin_lookup_result lookup;
> > > +
> > > + if (!is_factor(offset, pool->sectors_per_block))
> > > + return -EINVAL;
> > > +
> > > + if (!len || !is_factor(len, pool->sectors_per_block))
> > > + return -EINVAL;
> > > +
> > > + if (res && !is_factor(*res, pool->sectors_per_block))
> > > + return -EINVAL;
> > > +
> > > + end = offset + len;
> > > +
> > > + while (offset < end) {
> > > + vblock = offset;
> > > + do_div(vblock, pool->sectors_per_block);
> > > +
> > > + error = dm_thin_find_block(tc->td, vblock, true, &lookup);
> > > + if (error == 0)
> > > + goto next;
> > > + if (error != -ENODATA)
> > > + return error;
> > > +
> > > + error = alloc_data_block(tc, &pblock);
> >
> > So this means that if fallocate wants to BDEV_RES_PROVISION N blocks, it must
> > first increase the reservation (BDEV_RES_MOD) by N blocks to avoid using up
> > space that was previously reserved by some other caller. I think?
> >
>
> Yes, assuming this is being called from a filesystem using the
> reservation mechanism.
>
> > > + if (error)
> > > + return error;
> > > +
> > > + error = dm_thin_insert_block(tc->td, vblock, pblock);
> >
> > Having reserved and mapped blocks, what happens when we try to read them?
> > Do we actually get zeroes, or does the read go straight through to whatever
> > happens to be in the disk blocks? I don't think it's correct that we could
> > BDEV_RES_PROVISION and end up with stale credit card numbers from some other
> > thin device.
> >
>
> Agree, but I'm not really sure how this works in thinp tbh. fallocate
> wasn't really on my mind when doing this. I was simply trying to cobble
> together what I could to facilitate making progress on the fs parts
> (e.g., I just needed a call that allocated blocks and consumed
> reservation in the process).
>
> Skimming through the dm-thin code, it looks like a (configurable) block
> zeroing mechanism can be triggered from somewhere around
> provision_block()->schedule_zero(), depending on whether the incoming
> write overwrites the newly allocated block. If that's the case, then I
> suspect that means reads would just fall through to the block and return
> whatever was on disk. This code would probably need to tie into that
> zeroing mechanism one way or another to deal with that issue. (Though
> somebody who actually knows something about dm-thin should verify that.
> :)
>
BTW, if that mechanism is in fact doing I/O, that might not be the
appropriate solution for fallocate. Perhaps we'd have to consider an
unwritten flag or some such in dm-thin, if possible.
Brian
> Brian
>
> > (PS: I don't know enough about thinp to know if this has already been taken
> > care of. I didn't see anything, but who knows what I missed. :))
> >
> > --D
> >
> > > + if (error)
> > > + return error;
> > > +
> > > + if (res && *res)
> > > + *res -= pool->sectors_per_block;
> > > +next:
> > > + offset += pool->sectors_per_block;
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int thin_reserve_space(struct dm_target *ti, int mode, sector_t offset,
> > > + sector_t len, sector_t *res)
> > > +{
> > > + struct thin_c *tc = ti->private;
> > > + struct pool *pool = tc->pool;
> > > + sector_t blocks;
> > > + unsigned long flags;
> > > + int error;
> > > +
> > > + if (mode == BDEV_RES_PROVISION)
> > > + return thin_provision_space(ti, offset, len, res);
> > > +
> > > + /* res required for get/set */
> > > + error = -EINVAL;
> > > + if (!res)
> > > + return error;
> > > +
> > > + if (mode == BDEV_RES_GET) {
> > > + spin_lock_irqsave(&tc->pool->lock, flags);
> > > + *res = tc->reserve_count * pool->sectors_per_block;
> > > + spin_unlock_irqrestore(&tc->pool->lock, flags);
> > > + error = 0;
> > > + } else if (mode == BDEV_RES_MOD) {
> > > + /*
> > > + * @res must always be a factor of the pool's blocksize; upper
> > > + * layers can rely on the bdev's minimum_io_size for this.
> > > + */
> > > + if (!is_factor(*res, pool->sectors_per_block))
> > > + return error;
> > > +
> > > + blocks = *res;
> > > + (void) sector_div(blocks, pool->sectors_per_block);
> > > +
> > > + error = set_reserve_count(tc, blocks);
> > > + }
> > > +
> > > + return error;
> > > +}
> > > +
> > > static struct target_type thin_target = {
> > > .name = "thin",
> > > .version = {1, 18, 0},
> > > @@ -4285,6 +4445,7 @@ static struct target_type thin_target = {
> > > .status = thin_status,
> > > .iterate_devices = thin_iterate_devices,
> > > .io_hints = thin_io_hints,
> > > + .reserve_space = thin_reserve_space,
> > > };
> > >
> > > /*----------------------------------------------------------------*/
> > > --
> > > 2.4.11
> > >
> > > _______________________________________________
> > > xfs mailing list
> > > xfs at oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-block" in
> > the body of a message to majordomo at vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the dm-devel
mailing list