[dm-devel] [PATCH v2] multipath -u: test socket connection in non-blocking mode
bmarzins at redhat.com
Wed May 15 16:18:58 UTC 2019
On Thu, Apr 25, 2019 at 09:33:03PM +0200, Martin Wilck wrote:
> On Wed, 2019-04-24 at 11:07 +0200, Martin Wilck wrote:
> > Since commit d7188fcd "multipathd: start daemon after udev trigger",
> > multipathd startup is delayed during boot until after "udev settle"
> > terminates. But "multipath -u" is run by udev workers for storage
> > devices,
> > and attempts to connect to the multipathd socket. This causes a start
> > job
> > for multipathd to be scheduled by systemd, but that job won't be
> > started
> > until "udev settle" finishes. This is not a problem on systems with
> > 129 or
> > less storage units, because the connect() call of "multipath -u" will
> > succeed anyway. But on larger systems, the listen backlog of the
> > systemd
> > socket can be exceeded, which causes connect() calls for the socket
> > to
> > block until multipathd starts up and begins calling accept(). This
> > creates
> > a deadlock situation, because "multipath -u" (called by udev workers)
> > blocks, and thus "udev settle" doesn't finish, delaying multipathd
> > startup. This situation then persists until either the workers or
> > "udev
> > settle" time out. In the former case, path devices might be
> > misclassified
> > as non-multipath devices by "multipath -u".
> > Fix this by using a non-blocking socket fd for connect() and
> > interpret the
> > errno appropriately.
> > This patch reverts most of the changes from commit 8cdf6661
> > "multipath:
> > check on multipathd without starting it". Instead, "multipath -u"
> > does
> > access the socket and start multipath again (which is what we want
> > IMO),
> > but it is now able to detect and handle the "full backlog" situation.
> > Signed-off-by: Martin Wilck <mwilck at suse.com>
> > V2:
> > Use same error reporting convention in __mpath_connect() as in
> > mpath_connect() (Hannes Reinecke). We can't easily change the latter,
> > because it's part of the "public" libmpathcmd API.
> FTR, our customer reported that this patch fixed his problem.
> @Ben, I'd be grateful if you could try it (or have the user try it)
> in your problem case as well.
Unfortunately, I don't have a 129+ path system handy that the person who
does isn't around right now. The code makes sense, and assuming that I
can verify that it fixes the problem I'm seeing, I'm fine with going
> Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
More information about the dm-devel