[Linux-cluster] RHEL5.3 / cman-2.0.98-1.el5 / Problem loop on "Node x is undead"

Alain.Moulle Alain.Moulle at bull.net
Wed Feb 25 14:41:56 UTC 2009


Hi,

I'm facing again this problem of Node  evicted and Node is undead ...
And I really don't know what to do ... below are the traces in syslog.
My version is :RHEL5.3 / cman-2.0.98-1.el5

Feb 25 14:33:33 s_sys at xn3 qdiskd[27582]: <notice> Writing eviction 
notice for node 2
Feb 25 14:33:34 s_sys at xn3 qdiskd[27582]: <notice> Node 2 evicted
Feb 25 14:33:35 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
... etc.
Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
Feb 25 14:33:45 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice 
for node 2
Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
Feb 25 14:33:46 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice 
for node 2
Feb 25 14:33:47 s_kernel at xn3 kernel: dlm: closing connection to node 2
Feb 25 14:33:47 s_sys at xn3 fenced[27785]: xn4 not a cluster member after 
0 sec post_fail_delay
Feb 25 14:33:47 s_sys at xn3 fenced[27785]: fencing node "xn4"
Feb 25 14:33:47 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
...etc.
Feb 25 14:33:52 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice 
for node 2
Feb 25 14:33:52 s_sys at xn3 fenced[27785]: fence "xn4" success
Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
Feb 25 14:33:53 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice 
for node 2
Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
Feb 25 14:33:54 s_sys at xn3 qdiskd[27582]: <alert> Writing eviction notice 
for node 2
Feb 25 14:33:54 s_sys at xn3 clurgmgrd[27990]: <notice> Taking over service 
service:lustre_xn4 from down member xn4
Feb 25 14:33:55 s_sys at xn3 qdiskd[27582]: <crit> Node 2 is undead.
.. etc.

An then after reboot of xn4 , when we try to start the CS on xn4, it 
can't enter in the cluster, and we
must stop CS on both nodes and start on both sides again.

Where could this problem come from ? How can I avoid this eviction of 
node  ?

Any help would be very appreciated .
Thanks
Regards
Alain
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090225/549600a6/attachment.htm>


More information about the Linux-cluster mailing list