[dm-devel] [PATCHv5 00/14] dm-zoned: metadata version 2

Damien Le Moal Damien.LeMoal at wdc.com
Wed May 13 23:56:23 UTC 2020


Adding Martin.

On 2020/05/11 22:23, Damien Le Moal wrote:
> On 2020/05/11 20:46, Damien Le Moal wrote:
>> On 2020/05/11 20:25, Hannes Reinecke wrote:
>>> On 5/11/20 12:55 PM, Damien Le Moal wrote:
>>>> On 2020/05/11 11:46, Damien Le Moal wrote:
>>>>> Mike,
>>>>>
>>>>> I am still seeing the warning:
>>>>>
>>>>> [ 1827.839756] device-mapper: table: 253:1: adding target device sdj caused an
>>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096,
>>>>> alignment_offset=0, start=0
>>>>> [ 1827.856738] device-mapper: table: 253:1: adding target device sdj caused an
>>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096,
>>>>> alignment_offset=0, start=0
>>>>> [ 1827.874031] device-mapper: table: 253:1: adding target device sdj caused an
>>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096,
>>>>> alignment_offset=0, start=0
>>>>> [ 1827.891086] device-mapper: table: 253:1: adding target device sdj caused an
>>>>> alignment inconsistency: physical_block_size=4096, logical_block_size=4096,
>>>>> alignment_offset=0, start=0
>>>>>
>>>>> when mixing 512B sector and 4KB sector devices. Investigating now.
>>>>
>>>>
>>>> OK. Figured that one out: the 500GB SSD I am using for the regular device is
>>>> 976773168 512B sectors capacity, that is, not a multiple of the 256MB zone size,
>>>> and not even a multiple of 4K. This causes the creation of a 12MB runt zone of
>>>> 24624 sectors, which is ignored. But the start sector of the second device in
>>>> the dm-table remains 976773168, so not aligned on 4K. This causes
>>>> bdev_stack_limits to return an error and the above messages to be printed.
>>>>
>>>> So I think we need to completely ignore the eventual "runt" zone of the regular
>>>> device so that everything aligns correctly. This will need changes in both
>>>> dmzadm and dm-zoned.
>>>>
>>>> Hannes, I can hack something on top of your series. Or can you resend with that
>>>> fixed ?
>>>>
>>>>
>>>>
>>>>
>>> Does this one help?
>>
>> Nope. Same warning.
>>
>>>
>>> diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
>>> index ea43f6892ced..5daca82b5ec7 100644
>>> --- a/drivers/md/dm-zoned-target.c
>>> +++ b/drivers/md/dm-zoned-target.c
>>> @@ -1041,13 +1041,17 @@ static int dmz_iterate_devices(struct dm_target *ti,
>>>   {
>>>          struct dmz_target *dmz = ti->private;
>>>          unsigned int zone_nr_sectors = dmz_zone_nr_sectors(dmz->metadata);
>>> +       unsigned int nr_zones;
>>>          sector_t capacity;
>>>          int r;
>>>
>>> -       capacity = dmz->dev[0].capacity & ~(zone_nr_sectors - 1);
>>> +       nr_zones = DIV_ROUND_DOWN(dmz->dev[0].capacity, zone_nr_sectors);
>>> +       capacity = nr_zones * zone_nr_sectors;
>>
>> 	capacity = round_down(dmz->dev[0].capacity, zone_nr_sectors);
>>
>> is simpler :)
>>
>> In any case, your change does seem to do anything here. Before and after, the
>> capacity is rounded down to full zones, excluding the last runt zone. I think it
>> is to do with the table entry start offset given on DM start by dmzadm...
>>
>>
>>>          r = fn(ti, dmz->ddev[0], 0, capacity, data);
>>>          if (!r && dmz->ddev[1]) {
>>> -               capacity = dmz->dev[1].capacity & ~(zone_nr_sectors - 1);
>>> +               nr_zones = DIV_ROUND_DOWN(dmz->dev[1.capacity,
>>> +                                                  zone_nr_sectors));
>>> +               capacity = nr_zones * zone_nr_sectors;
>>>                  r = fn(ti, dmz->ddev[1], 0, capacity, data);
>>>          }
>>>          return r;
>>>
>>> Cheers,
> 
> The failure of bdev_stack_limits() generating the warning is due to the io_opt
> limit not being compatible with the physical blocks size... Nothing to do with
> zone start/runt zones.
> 
> The problem is here:
> 
> 	/* Optimal I/O a multiple of the physical block size? */
>         if (t->io_opt & (t->physical_block_size - 1)) {
>                 t->io_opt = 0;
>                 t->misaligned = 1;
>                 ret = -1;
>         }
> 
> For the ssd (t), I have io_opt at 512 and the physical block size at 4096,
> changed due to the satcking from the device real 512 phys block to the smr disk
> phys block. The SMR disk io_opt is 0, so the ssd io_opt remains unchaged at 512.
> And we end up with the misaligned trigger since 512 & 4095 = 512...
> 
> I do not understand clearly yet... I wonder why the io_opt for the SMR drive is
> not 4096, same as the physical sector size.

I investigated this and here is what I found out:

When the dual drive setup is started, dm_calculate_queue_limits() is called and
ti->type->iterate_devices(ti, dm_set_device_limits, &ti_limits) executed.

In dm-zoned, the iterate device method executes dm_set_device_limits() twice,
once for each drive of the setup.

The drives I am using are an M.2 SSD with a phys sector size of 512B and an
optimal IO size set to 512. The SMR drive has a phys sector size of 4K and the
optimal IO size set to 0, the drive does not report any value, not uncommon for
HDDs.

The result of bdev_stack_limits() execution from dm_set_device_limits() gives a
DM device phys sector of 4K, no surprise. The io_opt limit though end up being
512 = lcm(0, 512). That results in this condition to trigger:

	/* Optimal I/O a multiple of the physical block size? */
        if (t->io_opt & (t->physical_block_size - 1)) {
                t->io_opt = 0;
                t->misaligned = 1;
                ret = -1;
        }

since 512 & (4096 - 1) is not 0...

It looks to me like we should either have io_opt always be at least the phys
sector size, or change the limit stacking io_opt handling. I am not sure which
approach is best... Thoughts ?

Martin,

Any idea why the io_opt limit is not set to the physical block size when the
drive does not report an optimal transfer length ? Would it be bad to set that
value instead of leaving it to 0 ?

-- 
Damien Le Moal
Western Digital Research






More information about the dm-devel mailing list