[dm-devel] [RFC PATCH] dm-zoned: extend the way of exposing zoned block device

Dmitry Fomichev Dmitry.Fomichev at wdc.com
Wed Jan 8 22:46:03 UTC 2020


> -----Original Message-----
> From: linux-block-owner at vger.kernel.org <linux-block-
> owner at vger.kernel.org> On Behalf Of Bob Liu
> Sent: Wednesday, January 8, 2020 8:46 AM
> To: Damien Le Moal <Damien.LeMoal at wdc.com>; dm-devel at redhat.com
> Cc: linux-block at vger.kernel.org; snitzer at redhat.com; Dmitry Fomichev
> <Dmitry.Fomichev at wdc.com>; martin.petersen at oracle.com;
> shirley.ma at oracle.com
> Subject: Re: [RFC PATCH] dm-zoned: extend the way of exposing zoned
> block device
> 
> On 1/8/20 3:40 PM, Damien Le Moal wrote:
> > On 2020/01/08 16:13, Nobody wrote:
> >> From: Bob Liu <bob.liu at oracle.com>
> >>
> >> Motivation:
> >> Now the dm-zoned device mapper target exposes a zoned block
> device(ZBC) as a
> >> regular block device by storing metadata and buffering random writes in
> >> conventional zones.
> >> This way is not very flexible, there must be enough conventional zones
> and the
> >> performance may be constrained.
> >> By putting metadata(also buffering random writes) in separated device
> we can get
> >> more flexibility and potential performance improvement e.g by storing
> metadata
> >> in faster device like persistent memory.
> >>
> >> This patch try to split the metadata of dm-zoned to an extra block
> >> device instead of zoned block device itself.
> >> (Buffering random writes also in the todo list.)
> >>
> >> Patch is at the very early stage, just want to receive some feedback about
> >> this extension.
> >> Another option is to create an new md-zoned device with separated
> metadata
> >> device based on md framework.
> >
> > For metadata only, it should not be hard at all to move to another
> > conventional zone device. It will however be a little more tricky for
> > conventional zones used for data since dm-zoned assumes that this
> random
> > write space is also zoned. Moving this space to a conventional device
> > requires implementing a zone emulation (fake zones) for the regular
> > drive, using a zone size that matches the size of sequential zones.
> >
> 
> Indeed.
> I'll try to implement zone emulation next version.
> 
> > Beyond this, dm-zoned also needs to be changed to accept partial drives
> > and the dm core code to accept mixing of regular and zoned disks (that
> > is forbidden now).
> >
> 
> Do you mean the check in device_area_is_invalid()?
> 
>  320         /*
>  321          * If the target is mapped to zoned block device(s), check
>  322          * that the zones are not partially mapped.
>  323          */
>  324         if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {
> 
> > Another approach worth exploring is stacking dm-zoned as is on top of a
> > modified dm-linear with the ability to emulate conventional zones on top
> > of a regular block device (you only need report zones method
> > implemented). That is much easier to do. We actually hacked something
> > like that last month to see the performance change and saw a jump from
> > 56 MB/s to 250 MB/s for write intensive workloads (file creation on
> > ext4). I am not so warm in this approach though as it modifies dm-linear
> > and we want to keep it very lean and simple. Modifying dm-zoned to allow
> > support for a device pair is a better approach I think.
> >
> >>
> >> Signed-off-by: Bob Liu <bob.liu at oracle.com>
> >> ---
> >>  drivers/md/dm-zoned-metadata.c |  6 +++---
> >>  drivers/md/dm-zoned-target.c   | 15 +++++++++++++--
> >>  drivers/md/dm-zoned.h          |  1 +
> >>  3 files changed, 17 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-
> metadata.c
> >> index 22b3cb0..89cd554 100644
> >> --- a/drivers/md/dm-zoned-metadata.c
> >> +++ b/drivers/md/dm-zoned-metadata.c
> >> @@ -439,7 +439,7 @@ static struct dmz_mblock
> *dmz_get_mblock_slow(struct dmz_metadata *zmd,
> >>
> >>  	/* Submit read BIO */
> >>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
> >> -	bio_set_dev(bio, zmd->dev->bdev);
> >> +	bio_set_dev(bio, zmd->dev->meta_bdev);
> >>  	bio->bi_private = mblk;
> >>  	bio->bi_end_io = dmz_mblock_bio_end_io;
> >>  	bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO);
> >> @@ -593,7 +593,7 @@ static int dmz_write_mblock(struct dmz_metadata
> *zmd, struct dmz_mblock *mblk,
> >>  	set_bit(DMZ_META_WRITING, &mblk->state);
> >>
> >>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
> >> -	bio_set_dev(bio, zmd->dev->bdev);
> >> +	bio_set_dev(bio, zmd->dev->meta_bdev);
> >>  	bio->bi_private = mblk;
> >>  	bio->bi_end_io = dmz_mblock_bio_end_io;
> >>  	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
> >> @@ -620,7 +620,7 @@ static int dmz_rdwr_block(struct dmz_metadata
> *zmd, int op, sector_t block,
> >>  		return -ENOMEM;
> >>
> >>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
> >> -	bio_set_dev(bio, zmd->dev->bdev);
> >> +	bio_set_dev(bio, zmd->dev->meta_bdev);
> >>  	bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO);
> >>  	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
> >>  	ret = submit_bio_wait(bio);
> >> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-
> target.c
> >> index 70a1063..55c64fe 100644
> >> --- a/drivers/md/dm-zoned-target.c
> >> +++ b/drivers/md/dm-zoned-target.c
> >> @@ -39,6 +39,7 @@ struct dm_chunk_work {
> >>   */
> >>  struct dmz_target {
> >>  	struct dm_dev		*ddev;
> >> +	struct dm_dev           *metadata_dev;
> >
> > This naming would be confusing as it implies metadata only. If you also
> > want to move the random write zones to a regular device, then I would
> > suggest names like:
> >
> > ddev -> zoned_bdev (the zoned device, e.g. SMR disk)
> > metadata_dev -> reg_bdev (regular block device, e.g. an SSD)
> >
> 
> Will update.
> 
> > The metadata+random write (fake) zones space can use the regular block
> > device, and all sequential zones are assumed to be on the zoned device.
> > Care must be taken to handle the case of a zoned device that has
> > conventional zones: these could be used as is, not needing any reclaim,
> 
> Agree, that won't make things too complicate.
> Thank you for all the suggestions.
> 
> Regards,
> Bob
> 
> > so potentially contributing to further optimization.
> >
> >>
> >>  	unsigned long		flags;
> >>
> >> @@ -701,6 +702,7 @@ static int dmz_get_zoned_device(struct dm_target
> *ti, char *path)
> >>  		goto err;
> >>  	}
> >>
> >> +	dev->meta_bdev = dmz->metadata_dev->bdev;
> >>  	dev->bdev = dmz->ddev->bdev;
> >>  	(void)bdevname(dev->bdev, dev->name);
> >>
> >> @@ -761,7 +763,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned
> int argc, char **argv)
> >>  	int ret;
> >>
> >>  	/* Check arguments */
> >> -	if (argc != 1) {
> >> +	if (argc != 2) {
> >>  		ti->error = "Invalid argument count";
> >>  		return -EINVAL;
> >>  	}

I like the idea to have an additional device in dm-zoned as the source of random
zones as opposed to using block range concatenation via dm-linear. But,
shouldn't we retain the possibility to use just the conventional zones that exist
on the drive? This seems necessary for backwards compatibility. In the code
above, if the arg count is one, the processing probably should fall back to the
existing way of using drive's conventional zones.

> >> @@ -774,8 +776,16 @@ static int dmz_ctr(struct dm_target *ti, unsigned
> int argc, char **argv)
> >>  	}
> >>  	ti->private = dmz;
> >>
> >> +	/* Get the metadata block device */
> >> +	ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
> >> +			&dmz->metadata_dev);
> >> +	if (ret) {
> >> +		dmz->metadata_dev = NULL;
> >> +		goto err;
> >> +	}
> >> +
> >>  	/* Get the target zoned block device */
> >> -	ret = dmz_get_zoned_device(ti, argv[0]);
> >> +	ret = dmz_get_zoned_device(ti, argv[1]);
> >>  	if (ret) {
> >>  		dmz->ddev = NULL;
> >>  		goto err;
> >> @@ -856,6 +866,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned
> int argc, char **argv)
> >>  err_dev:
> >>  	dmz_put_zoned_device(ti);
> >>  err:
> >> +	dm_put_device(ti, dmz->metadata_dev);
> >>  	kfree(dmz);
> >>
> >>  	return ret;
> >> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
> >> index 5b5e493..e789ff5 100644
> >> --- a/drivers/md/dm-zoned.h
> >> +++ b/drivers/md/dm-zoned.h
> >> @@ -50,6 +50,7 @@
> >>   */
> >>  struct dmz_dev {
> >>  	struct block_device	*bdev;
> >> +	struct block_device     *meta_bdev;
> >>
> >>  	char			name[BDEVNAME_SIZE];
> >>
> >>





More information about the dm-devel mailing list