[Cluster-devel] fenced: don't ignore victim_done messages for reduced victims
Ryan O'Hara
rohara at redhat.com
Tue Feb 22 22:26:49 UTC 2011
Looks correct to me. ACK.
On Tue, Feb 22, 2011 at 05:01:27PM -0500, David Teigland wrote:
>
> Needs ACK for RHEL6.
>
>
> When a victim is "reduced" (i.e. fenced skips fencing it because it
> rejoins the cluster cleanly before fenced fences it), it is immediately
> removed from the list of victims, before the "victim_done" message is
> sent for it. The victim_done message updates the time of the last
> successful fencing operation for a failed node.
>
> The code that processes received victim_done messages was ignoring the
> message for the reduced victim because the node couldn't be found in
> the victims list. This caused the latest fencing information to not be
> recorded for the node, causing dlm_controld to wait indefinately for
> fencing to complete for the reduced victim.
>
> The fix is to simply record the information from a victim_done message
> even if the node is not in the victims list.
>
> bz 678704
>
> Signed-off-by: David Teigland <teigland at redhat.com>
> ---
> fence/fenced/cpg.c | 18 ++++++++++++------
> 1 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/fence/fenced/cpg.c b/fence/fenced/cpg.c
> index a8629b9..99e16a0 100644
> --- a/fence/fenced/cpg.c
> +++ b/fence/fenced/cpg.c
> @@ -652,9 +652,9 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
>
> node = get_node_victim(fd, id->nodeid);
> if (!node) {
> + /* see comment below about no node */
> log_debug("receive_victim_done %d:%u no victim nodeid %d",
> hd->nodeid, seq, id->nodeid);
> - return;
> }
>
> log_debug("receive_victim_done %d:%u remove victim %d time %llu how %d",
> @@ -670,9 +670,11 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
> if (hd->nodeid == our_nodeid) {
> /* sanity check, I don't think this should happen;
> see comment in fence_victims() */
> - if (!node->local_victim_done)
> - log_error("expect local_victim_done");
> - node->local_victim_done = 0;
> + if (node) {
> + if (!node->local_victim_done)
> + log_error("expect local_victim_done");
> + node->local_victim_done = 0;
> + }
> } else {
> /* save details of fencing operation from master, which
> master saves at the time it completes it */
> @@ -680,8 +682,12 @@ static void receive_victim_done(struct fd *fd, struct fd_header *hd, int len)
> id->fence_how, id->fence_time);
> }
>
> - list_del(&node->list);
> - free(node);
> + /* we can have no node when reduce_victims() removes it, bz 678704 */
> +
> + if (node) {
> + list_del(&node->list);
> + free(node);
> + }
> }
>
> /* we know that the quorum value here is consistent with the cpg events
> --
> 1.7.1.1
More information about the Cluster-devel
mailing list