[dm-devel] [PATCH 4/6] libmultipath: fix suspended devs from failed reloads

Benjamin Marzinski bmarzins at redhat.com
Tue May 16 17:33:50 UTC 2017


On Thu, May 11, 2017 at 10:26:52PM +0200, Martin Wilck wrote:
> On Tue, 2017-05-09 at 11:57 -0500, Benjamin Marzinski wrote:
> > When multipath reloads a device, it can either fail while loading the
> > new table or while resuming the device. If it fails while resuming
> > the
> > device, the device can get stuck in the suspended state.  To fix
> > this,
> > multipath needs to resume the device again so that it can continue
> > using
> > the old table.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins at redhat.com>
> > ---
> >  libmultipath/devmapper.c | 19 ++++++++++++++++++-
> >  libmultipath/devmapper.h |  1 +
> >  2 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libmultipath/devmapper.c b/libmultipath/devmapper.c
> > index 2c4a13a..69b634b 100644
> > --- a/libmultipath/devmapper.c
> > +++ b/libmultipath/devmapper.c
> > @@ -396,7 +396,13 @@ int dm_addmap_reload(struct multipath *mpp, char
> > *params, int flush)
> >  	if (r)
> >  		r = dm_simplecmd(DM_DEVICE_RESUME, mpp->alias,
> > !flush,
> >  				 1, udev_flags, 0);
> > -	return r;
> > +	if (r)
> > +		return r;
> > +
> > +	if (dm_is_suspended(mpp->alias))
> > +		dm_simplecmd(DM_DEVICE_RESUME, mpp->alias, !flush,
> > 1,
> > +			     udev_flags, 0);
> > +	return 0;
> >  }
> 
> Why would the second DM_DEVICE_RESUME call succeed if the first one
> failed?

Because if the first resume fails, device-mapper rolls back to the
original table.

The specific way that someone found this was by running a

multipathd resize

when only some of the paths had been rescaned and noticed the new,
larger, size. Since the first path had changed size (that's all the
multipath looks at to find the new size) the multipath device tried to
reload larger. However, when it tried to resume with the larger size, it
failed since some of the devices were too small.  The second resume lets
it try again with the original table.

Of course, that isn't the only way that this could fail, and it's nice
to be able to gracefully continue using the old table instead of just
being suspended.

-Ben

> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel




More information about the dm-devel mailing list