[Linux-cluster] RHEL5.3 / cman-2.0.98-1.el5 / Problem loop on "Node x is undead"
Chrissie Caulfield
ccaulfie at redhat.com
Wed Feb 25 14:52:08 UTC 2009
Alain.Moulle wrote:
> Hi,
>
> I'm facing again this problem of Node evicted and Node is undead ...
> And I really don't know what to do ... below are the traces in syslog.
> My version is :RHEL5.3 / cman-2.0.98-1.el5
>
> Feb 25 14:33:33 s_sys at xn3 qdiskd[27582]: <notice> Writing eviction
> notice for node 2
> Feb 25 14:33:34 s_sys at xn3 qdiskd[27582]: <notice> Node 2 evicted
> Feb 25 14:33:35 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ... etc.
> Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:47 s_kernel at xn3 kernel: dlm: closing connection to node 2
> Feb 25 14:33:47 s_sys at xn3 fenced[27785]: xn4 not a cluster member after
> 0 sec post_fail_delay
> Feb 25 14:33:47 s_sys at xn3 fenced[27785]: fencing node "xn4"
> Feb 25 14:33:47 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> ...etc.
> Feb 25 14:33:52 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:52 s_sys at xn3 fenced[27785]: fence "xn4" success
> Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice
> for node 2
> Feb 25 14:33:54 s_sys at xn3 clurgmgrd[27990]: <notice> Taking over service
> service:lustre_xn4 from down member xn4
> Feb 25 14:33:55 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
> .. etc.
>
> An then after reboot of xn4 , when we try to start the CS on xn4, it
> can't enter in the cluster, and we
> must stop CS on both nodes and start on both sides again.
>
> Where could this problem come from ? How can I avoid this eviction of
> node ?
>
> Any help would be very appreciated .
You haven't posted any cman/openais messages but it's quite possible
you've hit this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=485026
There's a patch included and some links to fixed RPMs.
Chrissie
More information about the Linux-cluster
mailing list