[dm-devel] dm-multipath: Accept failed paths for multipath maps

Merla, ShivaKrishna ShivaKrishna.Merla at netapp.com
Wed Dec 18 15:28:52 UTC 2013


> From: dm-devel-bounces at redhat.com [mailto:dm-devel-
> bounces at redhat.com] On Behalf Of Hannes Reinecke
> Sent: Wednesday, December 18, 2013 8:25 AM
> To: Mike Snitzer
> Cc: dm-devel at redhat.com; Stewart, Sean; Alasdair Kergon
> Subject: Re: [dm-devel] dm-multipath: Accept failed paths for multipath
> maps
> 
> On 12/18/2013 03:08 PM, Mike Snitzer wrote:
> > On Wed, Dec 18 2013 at  2:52am -0500,
> > Hannes Reinecke <hare at suse.de> wrote:
> >
> >> The multipath kernel module is rejecting any map with an invalid
> >> device. However, as the multipathd is processing the events serially
> >> it will try to push a map with invalid devices if more than one
> >> device failed at the same time.
> >> So we can as well accept those maps and make sure to mark the
> >> paths as down.
> >
> > Why is it so desirable to do this?  Reduced latency to restore at least
> > one valid path when a bunch of paths go down?
> >
> Without this patch multipathd cannot update the map as long is
> hasn't catched up with udev.
> During that time any scheduling decisions by the kernel part are
> necessarily wrong, as it has to rely on the old map.
> 
> > Why can't we just rely on userspace eventually figuring out which paths
> > are failed and pushing a valid map down?
> >
> Oh, you can. This is what we're doing now :-)
> 
> But it will lead to spurious error during failover when multipathd
> is trying to push down maps with invalid devices.
> 
> You are also running into a race window between checking the path in
> multipathd and pushing down the map; if the device disappears during
> that time you won't be able to push down the map.
> If that happens during boot multipathd won't be able to create the
> map at all, so you might not be able to boot here.
> With that patch you at least have the device-mapper device, allowing
> booting to continue.
> 
> > Are there favorable reports that this new behavior actually helps?
> > Please quantify how.
> >
> NetApp will have; they've been pushing me to forward this patch.
> Sean?
Yes, we have been seeing these issues with RHEL where table-load fails if
not able to open at least one device in the map. This might be due to transient errors and
device being in offline state. Due to this add-path events fail and new paths will not be added to the map.
When we reboot the alternate controllers we see all path failures. We saw this with IB and SAS configs.  
This is much needed fix for us during controller firmware upgrade tests.
> 
> BTW, SUSE / SLES is running happily with this patch for years now.
> So it can't be at all bad ...
> 
> Cheers,
> 
> Hannes
> --
> Dr. Hannes Reinecke		      zSeries & Storage
> hare at suse.de			      +49 911 74053 688
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel




More information about the dm-devel mailing list