[dm-devel] [PATCH 2/2] multipathd: handle errors in uxlsnr as fatal

Martin Wilck mwilck at suse.com
Wed Mar 21 07:48:33 UTC 2018


On Wed, 2018-03-21 at 02:43 +0000, Chongyun Wu wrote:
> On 2018/3/21 0:51, Martin Wilck wrote:
> > The ppoll() calls of the uxlsnr thread are vital for proper
> > functioning of
> > multipathd. If the uxlsnr thread can't open the socket or fails to
> > call ppoll()
> > for other reasons, quit the daemon. If we don't do that, multipathd
> > may
> > hang in a state where it can't be terminated any more, because the
> > uxlsnr
> > thread is responsible for handling all signals. This happens e.g.
> > if
> > systemd's multipathd.socket is running in and multipathd is started
> > from
> > outside systemd.
> > 
> > 24f2844 "multipathd: fix signal blocking logic" has made this
> > problem more
> > severe. Before that patch, the signals weren't actually blocked in
> > any thread.
> > That's not to say 24f2844 was wrong. I still think it's correct, we
> > just
> > need this one on top.
> > 
> > Signed-off-by: Martin Wilck <mwilck at suse.com>
> > ---
> >   multipathd/uxlsnr.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/multipathd/uxlsnr.c b/multipathd/uxlsnr.c
> > index cdafd82943e7..6f666663fc6f 100644
> > --- a/multipathd/uxlsnr.c
> > +++ b/multipathd/uxlsnr.c
> > @@ -178,7 +178,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   	if (ux_sock == -1) {
> >   		condlog(1, "could not create uxsock: %d", errno);
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   
> >   	pthread_cleanup_push(uxsock_cleanup, (void *)ux_sock);
> > @@ -187,7 +187,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   	polls = (struct pollfd *)MALLOC((MIN_POLLS + 1) *
> > sizeof(struct pollfd));
> >   	if (!polls) {
> >   		condlog(0, "uxsock: failed to allocate poll
> > fds");
> > -		return NULL;
> > +		exit_daemon();
> >   	}
> >   	sigfillset(&mask);
> >   	sigdelset(&mask, SIGINT);
> > @@ -249,6 +249,7 @@ void * uxsock_listen(uxsock_trigger_fn
> > uxsock_trigger, void * trigger_data)
> >   
> >   			/* something went badly wrong! */
> >   			condlog(0, "uxsock: poll failed with %d",
> > errno);
> > +			exit_daemon();
> >   			break;
> >   		}
> 
> Hi Martin,
> 
> Your analysis is reasonable. It is necessary to deal with fatal
> error 
> not only to return, if not doing this multipathd can't exit normally
> and 
> multipathd commands can't work any more. I think your patch is OK,
> but I 
> have some ideas inspired by your patch.
> Calling exit_daemon() is to shut down the multipathd, relay on the 
> outside to pull multipathd again. Is there a function can be use to
> deal 
> with fatal error?

I don't think so, that's why I call the error "fatal". But feel free to
come up with a more intelligent solution. In the only case with
practical relevance I've seen so far (socket can't be opened because
it's open in systemd), it's sufficient that multipathd is started via
systemd (this should be done in production anyway), or that the admin
runs "systemctl stop multipathd.socket" before starting the daemon
manually.

In general, I don't think it makes sense to try and be overly smart.
Errors that prevent the ux socket from being opened (with the mentioned
exception) are likely to be so severe that any attempts to work around
them will probably fail as well.

Martin

-- 
Dr. Martin Wilck <mwilck at suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




More information about the dm-devel mailing list