[Linux-cluster] RHEL5.3 / cman-2.0.98-1.el5 / Problem loop on "Node x is undead"

Wed Feb 25 14:52:08 UTC 2009

Alain.Moulle wrote:
> Hi,
> 
> I'm facing again this problem of Node  evicted and Node is undead ...
> And I really don't know what to do ... below are the traces in syslog.
> My version is :RHEL5.3 / cman-2.0.98-1.el5
> 
> Feb 25 14:33:33 s_sys at xn3 qdiskd[27582]: <notice> Writing eviction
> notice for node 2
> Feb 25 14:33:34 s_sys at xn3 qdiskd[27582]: <notice> Node 2 evicted
> Feb 25 14:33:35 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ... etc.
> Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:47 s_kernel at xn3 kernel: dlm: closing connection to node 2
> Feb 25 14:33:47 s_sys at xn3 fenced[27785]: xn4 not a cluster member after
> 0 sec post_fail_delay
> Feb 25 14:33:47 s_sys at xn3 fenced[27785]: fencing node "xn4"
> Feb 25 14:33:47 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ...etc.
> Feb 25 14:33:52 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:52 s_sys at xn3 fenced[27785]: fence "xn4" success
> Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys at xn3 clurgmgrd[27990]: <notice> Taking over service
> service:lustre_xn4 from down member xn4
> Feb 25 14:33:55 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> .. etc.
> 
> An then after reboot of xn4 , when we try to start the CS on xn4, it
> can't enter in the cluster, and we
> must stop CS on both nodes and start on both sides again.
> 
> Where could this problem come from ? How can I avoid this eviction of
> node  ?
> 
> Any help would be very appreciated .

You haven't posted any cman/openais messages but it's quite possible
you've hit this bug:

https://bugzilla.redhat.com/show_bug.cgi?id=485026

There's a patch included and some links to fixed RPMs.

Chrissie