[dm-devel] [PATCH] multipath-tools:fix unexport/export LUN access permission multipath can't restore
Changwei Ge
ge.changwei at h3c.com
Tue Jan 23 13:06:07 UTC 2018
Hi Ben,
On 2018/1/23 21:03, Benjamin Marzinski wrote:
> On Tue, Jan 23, 2018 at 10:52:35AM +0100, Hannes Reinecke wrote:
>> On 01/23/2018 10:25 AM, Wuchongyun wrote:
>>> Hi Christophe, Hannes, Martin, Xose and Benjamin,
>>>
>>> We meet below issue when doing the unexport/export LUN's access permission test, this patch is a method to fix this issue.
>>> If it is convenient for some of you, please help to review this patch, thanks.
>>>
>>> fix unexport/export LUN access permission multipath can't restore
>>>
>>> Issue:
>>> (1)Export LUN1 to host1,Create multipath, LUN1 have two paths: sda and
>>> sdb;
>>> (2)Unexport LUN1 to host1, host1 can't access LUN1.Kernel produce change
>>> uevent for sda and sdb, then uev_update_path been called, which will
>>> found the path's wwid have changed and set flag pp->wwid_changed
>>> , checker thread's check_path() found this flag will set those paths
>>> failed and report to DM, and then this multipath failed.
>>> (3)export LUN1 to host1 again. Kernel produce the change uevents, but
>>> those uevents are not belong to LUN1(not sda and sdb).
>>> So LUN1's related devices' status(wwid) not been updated by uevent when
>>> regain this LUN's permission and this multipath will not restore anymore.
>>>
>>> Through some experiments we found that:
>>> unexport/export LUN to host, kernel may not produce the change uevent
>>> for the right devices. It may be other devices, and the result may
>>> different.
>>> Get wwid by "/lib/udev/scsi_id --whitelisted --device=/dev/%n"will always
>>> get the latest wwid: when no access permission return a invalid wwid,
>>> when access restore return the origial right wwid.
>>>
>>> If we meet this situation above issue will happen:
>>> when unexporting LUN we get the right change uevent and refresh this
>>> path's wwid to a valid value(HP 3PAR is 30000000000000000);
>>> When exporting LUN to the host again we get the other devices' change
>>> uevents, then we have no chance to refresh the changed paths'wwid, and
>>> this multipath will not restore even it's LUN access permissions have
>>> been restored.
>>>
>>> This patch can fix this issue by:
>>> In check_path() if found flag pp->wwid_changed is set, compare the wwid
>>> gotten by scsi_id and the wwid gotten by udev(current get_uid). If the
>>> value is not equal, it most likely we got the access permissions or device
>>> valid again, scsi_id's wwid may restored, we should produce a change uevent
>>> to fresh this device's status.
>>>
>>> Signed-off-by: Chongyun Wu <wu.chongyun at h3c.com>
>>> ---
>>> libmultipath/defaults.h | 1 +
>>> multipathd/main.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 52 insertions(+)
>>>
>>> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
>>> index c9e3411..895d3a1 100644
>>> --- a/libmultipath/defaults.h
>>> +++ b/libmultipath/defaults.h
>>> @@ -3,6 +3,7 @@
>>> * and the TEMPLATE in libmultipath/hwtable.c
>>> */
>>> #define DEFAULT_UID_ATTRIBUTE "ID_SERIAL"
>>> +#define DEFAULT_GETUID "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
>>> #define DEFAULT_UDEVDIR "/dev"
>>> #define DEFAULT_MULTIPATHDIR "/" LIB_STRING "/multipath"
>>> #define DEFAULT_SELECTOR "service-time 0"
>>> diff --git a/multipathd/main.c b/multipathd/main.c
>>> index 27cf234..dedc7d2 100644
>>> --- a/multipathd/main.c
>>> +++ b/multipathd/main.c
>>> @@ -937,6 +937,9 @@ uev_update_path (struct uevent *uev, struct vectors * vecs)
>>> if (pp) {
>>> struct multipath *mpp = pp->mpp;
>>>
>>> + udev_device_unref(pp->udev);
>>> + pp->udev = udev_device_ref(uev->udev);
>>> +
>>> if (disable_changed_wwids &&
>>> (strlen(pp->wwid) || pp->wwid_changed)) {
>>> char wwid[WWID_SIZE];
>>> @@ -1578,6 +1581,54 @@ check_path (struct vectors * vecs, struct path * pp, int ticks)
>>> condlog(2, "%s: path wwid has changed. Refusing to use",
>>> pp->dev);
>>> newstate = PATH_DOWN;
>>> +
>>> + /*
>>> + * compare the wwid gotten form scsi_id with the wwid which
>>> + * gotten form udev. maybe the path wwid have been restored
>>> + * by LUN been exported to the host again or another
>>> + * operations making the wwid different form udev database.
>>> + * If so, should trigger change uevent to refresh this
>>> + * path's udev datebase.
>>> + */
>>> + char wwid_old[WWID_SIZE];
>>> + char wwid_form_scsi_id[WWID_SIZE];
>>> + char wwid_form_udev[WWID_SIZE];
>>> + char buff[CALLOUT_MAX_SIZE];
>>> + char *getuid_old = NULL;
>>> +
>>> + memset(wwid_form_scsi_id, 0, WWID_SIZE);
>>> + memset(wwid_form_udev, 0, WWID_SIZE);
>>> +
>>> + /* save this path's wwid and getuid*/
>>> + strcpy(wwid_old, pp->wwid);
>>> + if (pp->getuid) {
>>> + getuid_old = buff;
>>> + strcpy(getuid_old, pp->getuid);
>>> + }
>>> +
>>> + /* using /lib/udev/scsi_id to get wwid */
>>> + pp->getuid = DEFAULT_GETUID;
>>> + get_uid(pp, PATH_UP, pp->udev);
>>> + strcpy(wwid_form_scsi_id, pp->wwid);
>>> + /* using udev to get wwid */
>>> + pp->getuid = NULL;
>>> + get_uid(pp, PATH_UP, pp->udev);
>>> + strcpy(wwid_form_udev, pp->wwid);
>>> +
>>> + if (0 != strcmp(wwid_form_scsi_id, wwid_form_udev)) {
>>> + sysfs_attr_set_value(pp->udev, "uevent", "change",
>>> + strlen("change"));
>>> + condlog(2, "%s: wwid inconsistent triggering change "
>>> + "event to refresh udev database", pp->dev);
>>> + }
>>> +
>>> + /* change this path's wwid and getuid back*/
>>> + strcpy(pp->wwid, wwid_old);
>>> + if (getuid_old) {
>>> + strcpy(pp->getuid, getuid_old);
>>> + } else {
>>> + pp->getuid = getuid_old;
>>> + }
>>> }
>>>
>>> if (newstate == PATH_WILD || newstate == PATH_UNCHECKED) {
>>>
>> NAK.
>>
>> We currently don't handle WWID change / LUN mapping changes correctly
>> from the linux kernel; the only way to correctly instantiate this is to
>> remove the device from sysfs and rescan the host.
>> Everything else will result in partially stale values for the device,
>> and an incorrect operation.
>> This is not something you can 'fix' on the multipath level.
>
> Yes. The only safe thing to do when we notice this is disable access to
> the device. This is what the disable_changed_wwids option. If we see the
> same wwid, then it might make sense to update the udev device, but if
> the wwid has changed, then we must assume that our device got remapped
> underneath us, and we can't handle that.
Thanks for your clue. That's very helpful to us. :-)
Changwei
>
> -Ben
>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke Teamlead Storage & Networking
>> hare at suse.de +49 911 74053 688
>> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
>> HRB 21284 (AG Nürnberg)
>
More information about the dm-devel
mailing list