[dm-devel] [PATCH] multipath-tools:fix unexport/export LUN access permission multipath can't restore

Benjamin Marzinski bmarzins at redhat.com
Tue Jan 23 13:02:31 UTC 2018


On Tue, Jan 23, 2018 at 10:52:35AM +0100, Hannes Reinecke wrote:
> On 01/23/2018 10:25 AM, Wuchongyun wrote:
> > Hi Christophe, Hannes, Martin, Xose and Benjamin,
> > 
> > We meet below issue when doing the unexport/export LUN's access permission test, this patch is a method to fix this issue.
> > If it is convenient for some of you, please help to review this patch, thanks.
> > 
> > fix unexport/export LUN access permission multipath can't restore
> > 
> > Issue:
> > (1)Export LUN1 to host1,Create multipath, LUN1 have two paths: sda and
> > sdb;
> > (2)Unexport LUN1 to host1, host1 can't access LUN1.Kernel produce change
> > uevent for sda and sdb, then uev_update_path been called, which will
> > found the path's wwid have changed and set flag pp->wwid_changed
> > , checker thread's check_path() found this flag will set those paths
> > failed and report to DM, and then this multipath failed.
> > (3)export LUN1 to host1 again. Kernel produce the change uevents, but
> > those uevents are not belong to LUN1(not sda and sdb).
> > So LUN1's related devices' status(wwid) not been updated by uevent when
> > regain this LUN's permission and this multipath will not restore anymore.
> > 
> > Through some experiments we found that:
> > unexport/export LUN to host, kernel may not produce the change uevent
> > for the right devices. It may be other devices, and the result may
> > different.
> > Get wwid by "/lib/udev/scsi_id --whitelisted --device=/dev/%n"will always
> > get the latest wwid: when no access permission return a invalid wwid,
> > when access restore return the origial right wwid.
> > 
> > If we meet this situation above issue will happen:
> > when unexporting LUN we get the right change uevent and refresh this
> > path's wwid to a valid value(HP 3PAR is 30000000000000000);
> > When exporting LUN to the host again we get the other devices' change
> > uevents, then we have no chance to refresh the changed paths'wwid, and
> > this multipath will not restore even it's LUN access permissions have
> > been restored.
> > 
> > This patch can fix this issue by:
> > In check_path() if found flag pp->wwid_changed is set, compare the wwid
> > gotten by scsi_id and the wwid gotten by udev(current get_uid). If the
> > value is not equal, it most likely we got the access permissions or device
> > valid again, scsi_id's wwid may restored, we should produce a change uevent
> > to fresh this device's status.
> > 
> > Signed-off-by: Chongyun Wu <wu.chongyun at h3c.com>
> > ---
> >  libmultipath/defaults.h |    1 +
> >  multipathd/main.c       |   51 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 52 insertions(+)
> > 
> > diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> > index c9e3411..895d3a1 100644
> > --- a/libmultipath/defaults.h
> > +++ b/libmultipath/defaults.h
> > @@ -3,6 +3,7 @@
> >   * and the TEMPLATE in libmultipath/hwtable.c
> >   */
> >  #define DEFAULT_UID_ATTRIBUTE	"ID_SERIAL"
> > +#define DEFAULT_GETUID		"/lib/udev/scsi_id --whitelisted --device=/dev/%n"
> >  #define DEFAULT_UDEVDIR		"/dev"
> >  #define DEFAULT_MULTIPATHDIR	"/" LIB_STRING "/multipath"
> >  #define DEFAULT_SELECTOR	"service-time 0"
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 27cf234..dedc7d2 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -937,6 +937,9 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
> >  	if (pp) {
> >  		struct multipath *mpp = pp->mpp;
> >  
> > +		udev_device_unref(pp->udev);
> > +		pp->udev = udev_device_ref(uev->udev);
> > +
> >  		if (disable_changed_wwids &&
> >  		    (strlen(pp->wwid) || pp->wwid_changed)) {
> >  			char wwid[WWID_SIZE];
> > @@ -1578,6 +1581,54 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
> >  		condlog(2, "%s: path wwid has changed. Refusing to use",
> >  			pp->dev);
> >  		newstate = PATH_DOWN;
> > +
> > +		/*
> > +		 * compare the wwid gotten form scsi_id with the wwid which
> > +		 * gotten form udev. maybe the path wwid have been restored
> > +		 * by LUN been exported to the host again or another
> > +		 * operations making the wwid different form udev database.
> > +		 * If so, should trigger change uevent to refresh this
> > +		 * path's udev datebase.
> > +		*/
> > +		char wwid_old[WWID_SIZE];
> > +		char wwid_form_scsi_id[WWID_SIZE];
> > +		char wwid_form_udev[WWID_SIZE];
> > +		char buff[CALLOUT_MAX_SIZE];
> > +		char *getuid_old = NULL;
> > +
> > +		memset(wwid_form_scsi_id, 0, WWID_SIZE);
> > +		memset(wwid_form_udev, 0, WWID_SIZE);
> > +
> > +		/* save this path's  wwid and getuid*/
> > +		strcpy(wwid_old, pp->wwid);
> > +		if (pp->getuid) {
> > +			getuid_old = buff;
> > +			strcpy(getuid_old, pp->getuid);
> > +		}
> > +
> > +		/* using /lib/udev/scsi_id to get wwid */
> > +		pp->getuid = DEFAULT_GETUID;
> > +		get_uid(pp, PATH_UP, pp->udev);
> > +		strcpy(wwid_form_scsi_id, pp->wwid);
> > +		/* using udev to get wwid */
> > +		pp->getuid = NULL;
> > +		get_uid(pp, PATH_UP, pp->udev);
> > +		strcpy(wwid_form_udev, pp->wwid);
> > +
> > +		if (0 != strcmp(wwid_form_scsi_id, wwid_form_udev)) {
> > +			sysfs_attr_set_value(pp->udev, "uevent", "change",
> > +				strlen("change"));
> > +			condlog(2, "%s: wwid inconsistent triggering change "
> > +				"event to refresh udev database", pp->dev);
> > +		}
> > +
> > +		/* change this path's wwid and getuid back*/
> > +		strcpy(pp->wwid, wwid_old);
> > +		if (getuid_old) {
> > +			strcpy(pp->getuid, getuid_old);
> > +		} else {
> > +			pp->getuid = getuid_old;
> > +		}
> >  	}
> >  
> >  	if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
> > 
> NAK.
> 
> We currently don't handle WWID change / LUN mapping changes correctly
> from the linux kernel; the only way to correctly instantiate this is to
> remove the device from sysfs and rescan the host.
> Everything else will result in partially stale values for the device,
> and an incorrect operation.
> This is not something you can 'fix' on the multipath level.

Yes. The only safe thing to do when we notice this is disable access to
the device. This is what the disable_changed_wwids option. If we see the
same wwid, then it might make sense to update the udev device, but if
the wwid has changed, then we must assume that our device got remapped
underneath us, and we can't handle that.

-Ben
 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)




More information about the dm-devel mailing list