[dm-devel] [PATCH 7/9] multipathd: ignore failed wwid recheck
Benjamin Marzinski
bmarzins at redhat.com
Tue Feb 26 20:29:30 UTC 2019
On Tue, Feb 26, 2019 at 10:42:49AM +0100, Martin Wilck wrote:
> On Fri, 2019-02-22 at 10:58 -0600, Benjamin Marzinski wrote:
> > If disable_changed_wwids is set, when multipathd gets a change event
> > on
> > a path, it verifies that the wwid hasn't changed in
> > uev_update_path().
> > If get_uid() failed, uev_update_path treated this as a wwid change to
> > 0.
> > This could cause paths to suddenly be dropped due to an issue with
> > getting the wwid. Even if get_uid() failed because the path was
> > down,
> > it no change uevent happend when it later became active, multipathd
> > would continue to ignore the path.
> >
> > Instead, multipathd should neither set nor clear wwid_changed if
> > get_uid() returned failure.
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> > ---
> > multipathd/main.c | 7 ++++---
> > 1 file changed, 4 insertions(+), 3 deletions(-)
>
> Hm. Failure of get_uid() is a rather severe problem. With this change,
> if get_uid() fails, pp->wwid won't be reset to the original value, and
> dm_fail_path() won't be called. So the path will continue to be
> considered "good", but it will have a zero WWID from this point on. Or
> am I overlooking someting?
Nope. This patch is buggy. Well, actually, check_path() will call
update_multipath_strings() which will eventually call disassemble_map(),
which will reset the path's WWID to match the multipath device wwid
(since it's empty), which is why I didn't notice the issue right away.
This could easily be solved by simply resetting the WWID when get_uid
fails. But there are bigger issues with the patch. Sometimes get_uid()
doesn't return failure, it just returns 0 and an empty wwid. This still
allows the issue I am seeing to occur.
The problem I'm seeing is that sometimes (like when firmware gets
updated on a controller) multipath is temporarily unable to get the
wwid. If a change event happens during this period, get_uid() will set
the wwid to "", possibly without returning an error. After that,
multipathd will treat the path as if it was down, until a change event
occurs and multipath can successfully get the wwid.
> Maybe we should rather change the treatment of this case in
> check_path(). In uev_update_path(), we could leave the WWID at 0 and
> fail the path. Then in check_path(), when the path is up to be checked
> next time, we'd detect the strlen(pp->wwid) == 0 case (indicating
> previously failed get_uid()) and try to call get_uid() again to see if
> this was just a temporary WWID lookup failure or if the WWID actually
> changed (IOW: pp->wwid differs from pp->mpp->wwid). Would this make
> sense?
Sure. I'll try something along those lines (although I'll probably keep
the old wwid, and just set a flag, so the I don't have to make sure that
all of the code that deals with paths with blank wwids does the correct
thing in this case).
-Ben
> Martin
>
>
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 678ecf8..81ad6c0 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -1234,9 +1234,9 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> > goto out;
> >
> > strcpy(wwid, pp->wwid);
> > - get_uid(pp, pp->state, uev->udev);
> > + rc = get_uid(pp, pp->state, uev->udev);
> >
> > - if (strncmp(wwid, pp->wwid, WWID_SIZE) != 0) {
> > + if (rc == 0 && strncmp(wwid, pp->wwid, WWID_SIZE) != 0)
> > {
> > condlog(0, "%s: path wwid changed from '%s' to
> > '%s'. %s",
> > uev->kernel, wwid, pp->wwid,
> > (disable_changed_wwids ? "disallowing"
> > :
> > @@ -1252,7 +1252,8 @@ uev_update_path (struct uevent *uev, struct
> > vectors * vecs)
> > goto out;
> > }
> > } else {
> > - pp->wwid_changed = 0;
> > + if (rc == 0)
> > + pp->wwid_changed = 0;
> > udev_device_unref(pp->udev);
> > pp->udev = udev_device_ref(uev->udev);
> > conf = get_multipath_config();
>
More information about the dm-devel
mailing list