[dm-devel] md/dm-mpath: check whether all pgpaths have same uuid in multipath_ctr()

Ewan D. Milne emilne at redhat.com
Tue Mar 23 17:11:30 UTC 2021


On Tue, 2021-03-23 at 15:47 +0800, lixiaokeng wrote:
> 
> On 2021/3/22 22:22, Mike Snitzer wrote:
> > On Mon, Mar 22 2021 at  4:11am -0400,
> > Christoph Hellwig <hch at infradead.org> wrote:
> > 
> > > On Sat, Mar 20, 2021 at 03:19:23PM +0800, Zhiqiang Liu wrote:
> > > > From: Zhiqiang Liu <liuzhiqiang26 at huawei.com>
> > > > 
> > > > When we make IO stress test on multipath device, there will
> > > > be a metadata err because of wrong path. In the test, we
> > > > concurrent execute 'iscsi device login|logout' and
> > > > 'multipath -r' command with IO stress on multipath device.
> > > > In some case, systemd-udevd may have not time to process
> > > > uevents of iscsi device logout|login, and then 'multipath -r'
> > > > command triggers multipathd daemon calls ioctl to load table
> > > > with incorrect old device info from systemd-udevd.
> > > > Then, one iscsi path may be incorrectly attached to another
> > > > multipath which has different uuid. Finally, the metadata err
> > > > occurs when umounting filesystem to down write metadata on
> > > > the iscsi device which is actually not owned by the multipath
> > > > device.
> > > > 
> > > > So we need to check whether all pgpaths of one multipath have
> > > > the same uuid, if not, we should throw a error.
> > > > 
> > > > Signed-off-by: Zhiqiang Liu <liuzhiqiang26 at huawei.com>
> > > > Signed-off-by: lixiaokeng <lixiaokeng at huawei.com>
> > > > Signed-off-by: linfeilong <linfeilong at huawei.com>
> > > > Signed-off-by: Wubo <wubo40 at huawei.com>
> > > > ---
> > > >  drivers/md/dm-mpath.c   | 52
> > > > +++++++++++++++++++++++++++++++++++++++++
> > > >  drivers/scsi/scsi_lib.c |  1 +
> > > >  2 files changed, 53 insertions(+)
> > > > 
> > > > diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c
> > > > index bced42f082b0..f0b995784b53 100644
> > > > --- a/drivers/md/dm-mpath.c
> > > > +++ b/drivers/md/dm-mpath.c
> > > > @@ -24,6 +24,7 @@
> > > >  #include <linux/workqueue.h>
> > > >  #include <linux/delay.h>
> > > >  #include <scsi/scsi_dh.h>
> > > > +#include <linux/dm-ioctl.h>
> > > >  #include <linux/atomic.h>
> > > >  #include <linux/blk-mq.h>
> > > > 
> > > > @@ -1169,6 +1170,45 @@ static int parse_features(struct
> > > > dm_arg_set *as, struct multipath *m)
> > > >  	return r;
> > > >  }
> > > > 
> > > > +#define SCSI_VPD_LUN_ID_PREFIX_LEN 4
> > > > +#define MPATH_UUID_PREFIX_LEN 7
> > > > +static int check_pg_uuid(struct priority_group *pg, char
> > > > *md_uuid)
> > > > +{
> > > > +	char pgpath_uuid[DM_UUID_LEN] = {0};
> > > > +	struct request_queue *q;
> > > > +	struct pgpath *pgpath;
> > > > +	struct scsi_device *sdev;
> > > > +	ssize_t count;
> > > > +	int r = 0;
> > > > +
> > > > +	list_for_each_entry(pgpath, &pg->pgpaths, list) {
> > > > +		q = bdev_get_queue(pgpath->path.dev->bdev);
> > > > +		sdev = scsi_device_from_queue(q);
> > > 
> > > Common dm-multipath code should never poke into scsi
> > > internals.  This
> > > is something for the device handler to check.  It probably also
> > > won't
> > > work for all older devices.
> > 
> > Definitely.
> > 
> > But that aside, userspace (multipathd) _should_ be able to do extra
> > validation, _before_ pushing down a new table to the kernel, rather
> > than
> > forcing the kernel to do it.
> > 
> 
> Martin (committer of multipath-tools) said that:
> "Don't get me wrong, I don't argue against tough testing. But we
> should
> be aware that there are always time intervals during which
> multipathd's
> picture of the present devices is different from what the kernel
> sees."
> 
> It is difficult to solve this in multipathd.
> 
> Regards,
> Lixiaokeng
> 

I think the patch is no good.  There are plenty of devices that don't
support VPD page 83h:

int scsi_vpd_lun_id(struct scsi_device *sdev, char *id, size_t id_len)
{
        u8 cur_id_type = 0xff;
        u8 cur_id_size = 0;
        unsigned char *d, *cur_id_str;
        unsigned char __rcu *vpd_pg83;
        int id_size = -EINVAL;

        rcu_read_lock();
        vpd_pg83 = rcu_dereference(sdev->vpd_pg83);
        if (!vpd_pg83) {
                rcu_read_unlock();
                return -ENXIO;
        }

and the DM layer should not be looking at the properties of the
underlying devices in this way anyway.  It should be pushed down
to the table.

-Ewan






More information about the dm-devel mailing list