[Linux-cluster] CS5 still problem "Node x is undead" (contd.)
Alain Moulle
Alain.Moulle at bull.net
Mon May 26 08:33:11 UTC 2008
Hi
As told before, the patch :
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9
does not solve the problem for my configuration ...
Just an idea/question : could this problem be also linked
to the defaut value of token ? Or has it nothing to do with it ?
Because currently, I have this problem with a Quorum disk
configured and no token record in cluster.conf, so token
is at its default value ...
???
Thanks
Regards
Alain Moullé
> Hi Lon
> I've applied the patch (see resulting code below) but the patch
> does not solve the problem.
> Is there another patch linked to this problem ?
> Thanks
> Regards
> Alain Moullé
<
>>>> when testing a two-nodes cluster with quorum disk, when
>>>> I poweroff the node1 , node 2 fences well the node 1 and
>>>> failovers the service, but in log of node 2 I have before and after
>>>> the fence success messages many messages like this:
>>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>>>> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9
Resulting code after patch application in cman/qdisk/main.c :
===========================================================
Transition from Online -> Evicted
*/
if (ni[x].ni_misses > ctx->qc_tko &&
state_run(ni[x].ni_status.ps_state)) {
/*
Mark our internal views as dead if nodes miss too
many heartbeats... This will cause a master
transition if no live master exists.
*/
if (ni[x].ni_status.ps_state >= S_RUN &&
ni[x].ni_seen) {
clulog(LOG_DEBUG, "Node %d DOWN\n",
ni[x].ni_status.ps_nodeid);
ni[x].ni_seen = 0;
}
ni[x].ni_state = S_EVICT;
ni[x].ni_status.ps_state = S_EVICT;
ni[x].ni_evil_incarnation =
ni[x].ni_status.ps_incarnation;
/*
Write eviction notice if we're the master.
*/
if (ctx->qc_status == S_MASTER) {
clulog(LOG_NOTICE,
"Writing eviction notice for node %d\n",
ni[x].ni_status.ps_nodeid);
qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
S_EVICT, NULL, NULL, NULL);
if (ctx->qc_flags & RF_ALLOW_KILL) {
clulog(LOG_DEBUG, "Telling CMAN to "
"kill the node\n");
cman_kill_node(ctx->qc_ch,
ni[x].ni_status.ps_nodeid);
}
}
/* Clear our master mask for the node after eviction */
if (mask)
clear_bit(mask, (ni[x].ni_status.ps_nodeid-1),
sizeof(memb_mask_t));
continue;
}
------------------------------
More information about the Linux-cluster
mailing list