[dm-devel] dm-multipath: Accept failed paths for multipath maps
snitzer at redhat.com
Fri Jul 18 00:04:12 UTC 2014
On Wed, Dec 18 2013 at 10:28am -0500,
Stewart, Sean <Sean.Stewart at netapp.com> wrote:
> On Wed, 2013-12-18 at 15:25 +0100, Hannes Reinecke wrote:
> > On 12/18/2013 03:08 PM, Mike Snitzer wrote:
> > > On Wed, Dec 18 2013 at 2:52am -0500,
> > > Hannes Reinecke <hare at suse.de> wrote:
> > >
> > >> The multipath kernel module is rejecting any map with an invalid
> > >> device. However, as the multipathd is processing the events serially
> > >> it will try to push a map with invalid devices if more than one
> > >> device failed at the same time.
> > >> So we can as well accept those maps and make sure to mark the
> > >> paths as down.
> > >
> > > Why is it so desirable to do this? Reduced latency to restore at least
> > > one valid path when a bunch of paths go down?
> > >
> > Without this patch multipathd cannot update the map as long is
> > hasn't catched up with udev.
> > During that time any scheduling decisions by the kernel part are
> > necessarily wrong, as it has to rely on the old map.
> > > Why can't we just rely on userspace eventually figuring out which paths
> > > are failed and pushing a valid map down?
> > >
> > Oh, you can. This is what we're doing now :-)
> > But it will lead to spurious error during failover when multipathd
> > is trying to push down maps with invalid devices.
> > You are also running into a race window between checking the path in
> > multipathd and pushing down the map; if the device disappears during
> > that time you won't be able to push down the map.
> > If that happens during boot multipathd won't be able to create the
> > map at all, so you might not be able to boot here.
> > With that patch you at least have the device-mapper device, allowing
> > booting to continue.
> > > Are there favorable reports that this new behavior actually helps?
> > > Please quantify how.
> > >
> > NetApp will have; they've been pushing me to forward this patch.
> > Sean?
> Agree. Internally, we have run into numerous cases with Red Hat where
> the "failed in domap" error will occur, due to user space being behind,
> or device detaching taking too long. The most severe case is with
> InfiniBand, where the LLD may place a device offline, then every single
> reload that is trying to add a good path in will fail. I will qualify
> this by saying that I realize it is a problem that the device gets
> placed offline in the first place, but this patch would allow it a
> chance to continue on. The user still has to take manual steps to fix
> the problem in this case, but it seems less disruptive to applications.
> The device detaching case could be kind of disruptive to a user in the
> scenario they are upgrading the firmware on a NetApp E-Series box, and
> with this patch, at least a good path is able to be added in ASAP.
> > BTW, SUSE / SLES is running happily with this patch for years now.
> > So it can't be at all bad ...
> > Cheers,
> > Hannes
> Also agreed. We have seen this functionality in SLES for years, and
> have not run into a problem with it.
Revisiting this can of worms...
As part of full due-diligence on the approach that SUSE and NetApp have
seemingly enjoyed "for years" I reviewed Hannes' v3 patch, fixed one
issue and did some cleanup. I then converted over to using a slightly
different approach where-in the DM core becomes a more willing
co-conspirator in this hack by introducing the ability to have
place-holder devices (dm_dev without an opened bdev) referenced in a DM
table. The work is here:
Here is the diffstat of all 3 patches rolled up:
git diff d4bdac727f1e09412c762f177790a96432738264^..7681ae5ddb5d567800023477be7ddc68f9812a95 | diffstat
dm-mpath.c | 51 +++++++++++++++++++++++++++++++++++----------------
dm-table.c | 53 ++++++++++++++++++++++++++++++++++++++++-------------
dm.c | 5 ++---
dm.h | 12 ++++++++++++
4 files changed, 89 insertions(+), 32 deletions(-)
But it was only compile tested, because doing more validation of this
work would mean it has a snowballs chance in hell of seeing the light of
upstream. Sadly it doesn't have a good chance; it would require some
1) that mpath is bullet-proof no matter how crazy a user got with fake
place-holder devices in their DM tables (coupled with reinstate_path
2) that the storage configs that experienced problems with the current
DM mpath dm_get_device() failures weren't broken to start with (for
instance ib srp is apparently fixed now.. but those fixes are still
working their way into RHEL) -- or put differently: I need _details_
on the NetApp or other legit storage configs that are still
experiencing problems without a solution to this problem.
... and even with that proof I'm pretty sure Alasdair will hate this
place-holder approach and will push for some other solution.
I'm going away on paternity leave until Sept 8... my _hope_ is that
someone fixes multipath-tools to suck less or that a more clever
solution to this problem is developed locally in DM mpath.
More information about the dm-devel