[lvm-devel] stable-2.02 - libdaemon: fix the race of handling SIGTERM timeout

wangjufeng wangjufeng at huawei.com
Wed Sep 25 07:48:23 UTC 2019


>From e97d3a14d070a85867ce39fc1749d19c2ac0b685 Mon Sep 17 00:00:00 2001
From: wangjufeng <wangjufeng at huawei.com>
Date: Wed, 25 Sep 2019 15:34:25 +0800
Subject: [PATCH] libdaemon: fix the race of handling SIGTERM timeout

When the daemon service lvmetad running without "-t" option,system
shutdown may take a long time.

error could be simulated follow those steps:
1.remove "-t" option from lvmetad start arguments.
2.start a process p1 like this
                while :
                do
                                time service lvm2-lvmetad stop
                                sleep 2
                                service lvm2-lvmetad start
                done
3.start a process p2 like this
                while :
                do
                                lvs
                done
4.use ctrl+c stop process p2, it is very easy to occur that, stop
lvm2-lvmetad timeout.

This is because, when system shutdown, lvmetad recieve SIGTERM,
_shutdown_requested will change to 1 asynchronously. If a client
thread is still running, s.threads->next is not NULL. It could
happen that (_shutdown_requested && !s.threads->next) will not be
satisfied before calling pselect, then it call pselect and get
blocked . If no new request or signall come, systemd have to
send SIGKILL after SIGTERM timeout, usually it could take 90
seconds.

This patch call _reap(s, 1) to finish all client threads after
_shutdown_requested change to 1. And always pass a timeout
value to pselect to avoid that _shutdown_requested change to 1
between judging _shutdown_requested value and calling pselect.

Signed-off-by: wangjufeng <wangjufeng at huawei.com>
---
libdaemon/server/daemon-server.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/libdaemon/server/daemon-server.c b/libdaemon/server/daemon-server.c
index bc58f7b..2a7c291 100644
--- a/libdaemon/server/daemon-server.c
+++ b/libdaemon/server/daemon-server.c
@@ -87,19 +87,6 @@ static int _is_idle(daemon_state s)
               return s.idle && s.idle->is_idle && !s.threads->next;
}
-static struct timespec *_get_timeout(daemon_state s)
-{
-              return s.idle ? s.idle->ptimeout : NULL;
-}
-
-static void _reset_timeout(daemon_state s)
-{
-              if (s.idle) {
-                              s.idle->ptimeout->tv_sec = 1;
-                              s.idle->ptimeout->tv_nsec = 0;
-              }
-}
-
static unsigned _get_max_timeouts(daemon_state s)
{
               return s.idle ? s.idle->max_timeouts : 0;
@@ -563,7 +550,7 @@ void daemon_start(daemon_state s)
               fd_set in;
               sigset_t new_set, old_set;
               int ret;
-
+             struct timeval select_timeout = { .tv_sec = 1, .tv_usec = 0  };
               /*
                * Switch to C locale to avoid reading large locale-archive file used by
                * some glibc (on some distributions it takes over 100MB). Some daemons
@@ -651,18 +638,20 @@ void daemon_start(daemon_state s)
               sigprocmask(SIG_SETMASK, NULL, &old_set);
                while (!failed) {
-                              _reset_timeout(s);
                               FD_ZERO(&in);
                               FD_SET(s.socket_fd, &in);
                                _reap(s, 0);
                               sigprocmask(SIG_SETMASK, &new_set, NULL);
-                              if (_shutdown_requested && !s.threads->next) {
+                             if (_shutdown_requested) {
+                                             if (s.threads->next) {
+                                                             _reap(s, 1);
+                                             }
                                               sigprocmask(SIG_SETMASK, &old_set, NULL);
                                               INFO(&s, "%s shutdown requested", s.name);
                                               break;
                               }
-                              ret = pselect(s.socket_fd + 1, &in, NULL, NULL, _get_timeout(s), &old_set);
+                             ret = pselect(s.socket_fd + 1, &in, NULL, NULL, &select_timeout, &old_set);
                               sigprocmask(SIG_SETMASK, &old_set, NULL);
                                if (ret < 0) {
--
1.8.3.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190925/8b481aa2/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: libdaemon-fix-the-race-of-handling-SIGTERM-timeout.patch
Type: application/octet-stream
Size: 3257 bytes
Desc: libdaemon-fix-the-race-of-handling-SIGTERM-timeout.patch
URL: <http://listman.redhat.com/archives/lvm-devel/attachments/20190925/8b481aa2/attachment.obj>


More information about the lvm-devel mailing list