[Cluster-devel] [PATCH] rgmanager: Retry when config is out of sync [RHEL5]

Lon Hohberger lhh at redhat.com
Wed Feb 29 23:53:19 UTC 2012


[This patch is already in RHEL5]

If you add a service to rgmanager v1 or v2 and that
service fails to start on the first node but succeeds
in its initial stop operation, there is a chance that
the remote instance of rgmanager has not yet reread
the configuration, causing the service to be placed
into the 'recovering' state without further action.

This patch causes the originator of the request to
retry the operation.

Later versions of rgmanager (ex STABLE3 branch and
derivatives) are unlikely to have this problem since
configuration updates are not polled, but rather
delivered to clients.

Update 22-Feb-2012: The above is incorrect, this was
reproduced a rgmanager v3 installation.

Resolves: rhbz#796272

Signed-off-by: Lon Hohberger <lhh at redhat.com>
---
 rgmanager/src/daemons/rg_state.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/rgmanager/src/daemons/rg_state.c b/rgmanager/src/daemons/rg_state.c
index 23a4bec..8c5af5b 100644
--- a/rgmanager/src/daemons/rg_state.c
+++ b/rgmanager/src/daemons/rg_state.c
@@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target,
 	rg_state_t svcStatus;
 	int target = preferred_target, me = my_id();
 	int ret, x, request = orig_request;
+	int retries;
 	
 	get_rg_state_local(svcName, &svcStatus);
 	if (svcStatus.rs_state == RG_STATE_DISABLED ||
@@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target,
 		if (target == me)
 			goto exhausted;
 
+		retries = 0;
+retry:
 		ret = svc_start_remote(svcName, request, target);
 		switch (ret) {
 		case RG_ERUN:
@@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, int preferred_target,
 			*new_owner = svcStatus.rs_owner;
 			free_member_list(allowed_nodes);
 			return 0;
+		case RG_ENOSERVICE:
+			/*
+			 * Configuration update pending on remote node?  Give it
+			 * a few seconds to sync up.  rhbz#568126
+			 *
+			 * Configuration updates are synchronized in later releases
+			 * of rgmanager; this should not be needed.
+			 */
+			if (retries++ < 4) {
+				sleep(3);
+				goto retry;
+			}
+			logt_print(LOG_WARNING, "Member #%d has a different "
+				   "configuration than I do; trying next "
+				   "member.", target);
+			/* Deliberate */
 		case RG_EDEPEND:
 		case RG_EFAIL:
 			/* Uh oh - we failed to relocate to this node.
-- 
1.7.7.6




More information about the Cluster-devel mailing list