[dm-devel] [RFC PATCH] dm-zoned: extend the way of exposing zoned block device

Bob Liu bob.liu at oracle.com
Thu Jan 9 00:57:36 UTC 2020


On 1/9/20 6:46 AM, Dmitry Fomichev wrote:
>> -----Original Message-----
>> From: linux-block-owner at vger.kernel.org <linux-block-
>> owner at vger.kernel.org> On Behalf Of Bob Liu
>> Sent: Wednesday, January 8, 2020 8:46 AM
>> To: Damien Le Moal <Damien.LeMoal at wdc.com>; dm-devel at redhat.com
>> Cc: linux-block at vger.kernel.org; snitzer at redhat.com; Dmitry Fomichev
>> <Dmitry.Fomichev at wdc.com>; martin.petersen at oracle.com;
>> shirley.ma at oracle.com
>> Subject: Re: [RFC PATCH] dm-zoned: extend the way of exposing zoned
>> block device
>>
>> On 1/8/20 3:40 PM, Damien Le Moal wrote:
>>> On 2020/01/08 16:13, Nobody wrote:
>>>> From: Bob Liu <bob.liu at oracle.com>
>>>>
>>>> Motivation:
>>>> Now the dm-zoned device mapper target exposes a zoned block
>> device(ZBC) as a
>>>> regular block device by storing metadata and buffering random writes in
>>>> conventional zones.
>>>> This way is not very flexible, there must be enough conventional zones
>> and the
>>>> performance may be constrained.
>>>> By putting metadata(also buffering random writes) in separated device
>> we can get
>>>> more flexibility and potential performance improvement e.g by storing
>> metadata
>>>> in faster device like persistent memory.
>>>>
>>>> This patch try to split the metadata of dm-zoned to an extra block
>>>> device instead of zoned block device itself.
>>>> (Buffering random writes also in the todo list.)
>>>>
>>>> Patch is at the very early stage, just want to receive some feedback about
>>>> this extension.
>>>> Another option is to create an new md-zoned device with separated
>> metadata
>>>> device based on md framework.
>>>
>>> For metadata only, it should not be hard at all to move to another
>>> conventional zone device. It will however be a little more tricky for
>>> conventional zones used for data since dm-zoned assumes that this
>> random
>>> write space is also zoned. Moving this space to a conventional device
>>> requires implementing a zone emulation (fake zones) for the regular
>>> drive, using a zone size that matches the size of sequential zones.
>>>
>>
>> Indeed.
>> I'll try to implement zone emulation next version.
>>
>>> Beyond this, dm-zoned also needs to be changed to accept partial drives
>>> and the dm core code to accept mixing of regular and zoned disks (that
>>> is forbidden now).
>>>
>>
>> Do you mean the check in device_area_is_invalid()?
>>
>>  320         /*
>>  321          * If the target is mapped to zoned block device(s), check
>>  322          * that the zones are not partially mapped.
>>  323          */
>>  324         if (bdev_zoned_model(bdev) != BLK_ZONED_NONE) {
>>
>>> Another approach worth exploring is stacking dm-zoned as is on top of a
>>> modified dm-linear with the ability to emulate conventional zones on top
>>> of a regular block device (you only need report zones method
>>> implemented). That is much easier to do. We actually hacked something
>>> like that last month to see the performance change and saw a jump from
>>> 56 MB/s to 250 MB/s for write intensive workloads (file creation on
>>> ext4). I am not so warm in this approach though as it modifies dm-linear
>>> and we want to keep it very lean and simple. Modifying dm-zoned to allow
>>> support for a device pair is a better approach I think.
>>>
>>>>
>>>> Signed-off-by: Bob Liu <bob.liu at oracle.com>
>>>> ---
>>>>  drivers/md/dm-zoned-metadata.c |  6 +++---
>>>>  drivers/md/dm-zoned-target.c   | 15 +++++++++++++--
>>>>  drivers/md/dm-zoned.h          |  1 +
>>>>  3 files changed, 17 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/md/dm-zoned-metadata.c b/drivers/md/dm-zoned-
>> metadata.c
>>>> index 22b3cb0..89cd554 100644
>>>> --- a/drivers/md/dm-zoned-metadata.c
>>>> +++ b/drivers/md/dm-zoned-metadata.c
>>>> @@ -439,7 +439,7 @@ static struct dmz_mblock
>> *dmz_get_mblock_slow(struct dmz_metadata *zmd,
>>>>
>>>>  	/* Submit read BIO */
>>>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>>>> -	bio_set_dev(bio, zmd->dev->bdev);
>>>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>>>  	bio->bi_private = mblk;
>>>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>>>>  	bio_set_op_attrs(bio, REQ_OP_READ, REQ_META | REQ_PRIO);
>>>> @@ -593,7 +593,7 @@ static int dmz_write_mblock(struct dmz_metadata
>> *zmd, struct dmz_mblock *mblk,
>>>>  	set_bit(DMZ_META_WRITING, &mblk->state);
>>>>
>>>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>>>> -	bio_set_dev(bio, zmd->dev->bdev);
>>>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>>>  	bio->bi_private = mblk;
>>>>  	bio->bi_end_io = dmz_mblock_bio_end_io;
>>>>  	bio_set_op_attrs(bio, REQ_OP_WRITE, REQ_META | REQ_PRIO);
>>>> @@ -620,7 +620,7 @@ static int dmz_rdwr_block(struct dmz_metadata
>> *zmd, int op, sector_t block,
>>>>  		return -ENOMEM;
>>>>
>>>>  	bio->bi_iter.bi_sector = dmz_blk2sect(block);
>>>> -	bio_set_dev(bio, zmd->dev->bdev);
>>>> +	bio_set_dev(bio, zmd->dev->meta_bdev);
>>>>  	bio_set_op_attrs(bio, op, REQ_SYNC | REQ_META | REQ_PRIO);
>>>>  	bio_add_page(bio, page, DMZ_BLOCK_SIZE, 0);
>>>>  	ret = submit_bio_wait(bio);
>>>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-
>> target.c
>>>> index 70a1063..55c64fe 100644
>>>> --- a/drivers/md/dm-zoned-target.c
>>>> +++ b/drivers/md/dm-zoned-target.c
>>>> @@ -39,6 +39,7 @@ struct dm_chunk_work {
>>>>   */
>>>>  struct dmz_target {
>>>>  	struct dm_dev		*ddev;
>>>> +	struct dm_dev           *metadata_dev;
>>>
>>> This naming would be confusing as it implies metadata only. If you also
>>> want to move the random write zones to a regular device, then I would
>>> suggest names like:
>>>
>>> ddev -> zoned_bdev (the zoned device, e.g. SMR disk)
>>> metadata_dev -> reg_bdev (regular block device, e.g. an SSD)
>>>
>>
>> Will update.
>>
>>> The metadata+random write (fake) zones space can use the regular block
>>> device, and all sequential zones are assumed to be on the zoned device.
>>> Care must be taken to handle the case of a zoned device that has
>>> conventional zones: these could be used as is, not needing any reclaim,
>>
>> Agree, that won't make things too complicate.
>> Thank you for all the suggestions.
>>
>> Regards,
>> Bob
>>
>>> so potentially contributing to further optimization.
>>>
>>>>
>>>>  	unsigned long		flags;
>>>>
>>>> @@ -701,6 +702,7 @@ static int dmz_get_zoned_device(struct dm_target
>> *ti, char *path)
>>>>  		goto err;
>>>>  	}
>>>>
>>>> +	dev->meta_bdev = dmz->metadata_dev->bdev;
>>>>  	dev->bdev = dmz->ddev->bdev;
>>>>  	(void)bdevname(dev->bdev, dev->name);
>>>>
>>>> @@ -761,7 +763,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned
>> int argc, char **argv)
>>>>  	int ret;
>>>>
>>>>  	/* Check arguments */
>>>> -	if (argc != 1) {
>>>> +	if (argc != 2) {
>>>>  		ti->error = "Invalid argument count";
>>>>  		return -EINVAL;
>>>>  	}
> 
> I like the idea to have an additional device in dm-zoned as the source of random
> zones as opposed to using block range concatenation via dm-linear. But,
> shouldn't we retain the possibility to use just the conventional zones that exist
> on the drive? This seems necessary for backwards compatibility. In the code
> above, if the arg count is one, the processing probably should fall back to the
> existing way of using drive's conventional zones.
> 

Yeah, that's better. Will update.
Thanks! -Bob

>>>> @@ -774,8 +776,16 @@ static int dmz_ctr(struct dm_target *ti, unsigned
>> int argc, char **argv)
>>>>  	}
>>>>  	ti->private = dmz;
>>>>
>>>> +	/* Get the metadata block device */
>>>> +	ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
>>>> +			&dmz->metadata_dev);
>>>> +	if (ret) {
>>>> +		dmz->metadata_dev = NULL;
>>>> +		goto err;
>>>> +	}
>>>> +
>>>>  	/* Get the target zoned block device */
>>>> -	ret = dmz_get_zoned_device(ti, argv[0]);
>>>> +	ret = dmz_get_zoned_device(ti, argv[1]);
>>>>  	if (ret) {
>>>>  		dmz->ddev = NULL;
>>>>  		goto err;
>>>> @@ -856,6 +866,7 @@ static int dmz_ctr(struct dm_target *ti, unsigned
>> int argc, char **argv)
>>>>  err_dev:
>>>>  	dmz_put_zoned_device(ti);
>>>>  err:
>>>> +	dm_put_device(ti, dmz->metadata_dev);
>>>>  	kfree(dmz);
>>>>
>>>>  	return ret;
>>>> diff --git a/drivers/md/dm-zoned.h b/drivers/md/dm-zoned.h
>>>> index 5b5e493..e789ff5 100644
>>>> --- a/drivers/md/dm-zoned.h
>>>> +++ b/drivers/md/dm-zoned.h
>>>> @@ -50,6 +50,7 @@
>>>>   */
>>>>  struct dmz_dev {
>>>>  	struct block_device	*bdev;
>>>> +	struct block_device     *meta_bdev;
>>>>
>>>>  	char			name[BDEVNAME_SIZE];
>>>>
>>>>
> 




More information about the dm-devel mailing list