[Cluster-devel] fenced: don't ignore victim_done messages for reduced victims

Ryan O'Hara rohara at redhat.com
Tue Feb 22 22:26:49 UTC 2011


Looks correct to me. ACK.

On Tue, Feb 22, 2011 at 05:01:27PM -0500, David Teigland wrote:
> 
> Needs ACK for RHEL6.
> 
> 
> When a victim is "reduced" (i.e. fenced skips fencing it because it
> rejoins the cluster cleanly before fenced fences it), it is immediately
> removed from the list of victims, before the "victim_done" message is
> sent for it.  The victim_done message updates the time of the last
> successful fencing operation for a failed node.
> 
> The code that processes received victim_done messages was ignoring the
> message for the reduced victim because the node couldn't be found in
> the victims list.  This caused the latest fencing information to not be
> recorded for the node, causing dlm_controld to wait indefinately for
> fencing to complete for the reduced victim.
> 
> The fix is to simply record the information from a victim_done message
> even if the node is not in the victims list.
> 
> bz 678704
> 
> Signed-off-by: David Teigland <teigland at redhat.com>
> ---
>  fence/fenced/cpg.c |   18 ++++++++++++------
>  1 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c
> index a8629b9..99e16a0 100644
> --- a/fence/fenced/cpg.c
> +++ b/fence/fenced/cpg.c
> @@ -652,9 +652,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  
>  	node = get_node_victim(fd, id->nodeid);
>  	if (!node) {
> +		/* see comment below about no node */
>  		log_debug("receive_victim_done %d:%u no victim nodeid %d",
>  			  hd->nodeid, seq, id->nodeid);
> -		return;
>  	}
>  
>  	log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d",
> @@ -670,9 +670,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  	if (hd->nodeid == our_nodeid) {
>  		/* sanity check, I don't think this should happen;
>  		   see comment in fence_victims() */
> -		if (!node->local_victim_done)
> -			log_error("expect local_victim_done");
> -		node->local_victim_done = 0;
> +		if (node) {
> +			if (!node->local_victim_done)
> +				log_error("expect local_victim_done");
> +			node->local_victim_done = 0;
> +		}
>  	} else {
>  		/* save details of fencing operation from master, which
>  		   master saves at the time it completes it */
> @@ -680,8 +682,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>  				   id->fence_how, id->fence_time);
>  	}
>  
> -	list_del(&node->list);
> -	free(node);
> +	/* we can have no node when reduce_victims() removes it, bz 678704 */
> +
> +	if (node) {
> +		list_del(&node->list);
> +		free(node);
> +	}
>  }
>  
>  /* we know that the quorum value here is consistent with the cpg events
> -- 
> 1.7.1.1




More information about the Cluster-devel mailing list