[Linux-cluster] CS5 still problem "Node x is undead"
Alain Moulle
Alain.Moulle at bull.net
Thu May 22 14:43:01 UTC 2008
Hi Lon
I've applied the patch (see resulting code below) but the patch
does not solve the problem.
Is there another patch linked to this problem ?
Thanks
Regards
Alain Moullé
>> when testing a two-nodes cluster with quorum disk, when
>> I poweroff the node1 , node 2 fences well the node 1 and
>> failovers the service, but in log of node 2 I have before and after
>> the fence success messages many messages like this:
>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9
Resulting code after patch application in cman/qdisk/main.c :
===========================================================
Transition from Online -> Evicted
*/
if (ni[x].ni_misses > ctx->qc_tko &&
state_run(ni[x].ni_status.ps_state)) {
/*
Mark our internal views as dead if nodes miss too
many heartbeats... This will cause a master
transition if no live master exists.
*/
if (ni[x].ni_status.ps_state >= S_RUN &&
ni[x].ni_seen) {
clulog(LOG_DEBUG, "Node %d DOWN\n",
ni[x].ni_status.ps_nodeid);
ni[x].ni_seen = 0;
}
ni[x].ni_state = S_EVICT;
ni[x].ni_status.ps_state = S_EVICT;
ni[x].ni_evil_incarnation =
ni[x].ni_status.ps_incarnation;
/*
Write eviction notice if we're the master.
*/
if (ctx->qc_status == S_MASTER) {
clulog(LOG_NOTICE,
"Writing eviction notice for node %d\n",
ni[x].ni_status.ps_nodeid);
qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
S_EVICT, NULL, NULL, NULL);
if (ctx->qc_flags & RF_ALLOW_KILL) {
clulog(LOG_DEBUG, "Telling CMAN to "
"kill the node\n");
cman_kill_node(ctx->qc_ch,
ni[x].ni_status.ps_nodeid);
}
}
/* Clear our master mask for the node after eviction */
if (mask)
clear_bit(mask, (ni[x].ni_status.ps_nodeid-1),
sizeof(memb_mask_t));
continue;
}
More information about the Linux-cluster
mailing list