[Linux-cluster] RE: Cluster Suite 4 failover problem

Dicky dicky_nnc at yahoo.com.hk
Sat Oct 21 13:59:19 UTC 2006


HI  Jeff & Lon,

Thanks for the reply.

Regarding the didn't failover issue (just displayed the "Owner --> 
unknown"  and "State --> started" but actually none services were 
available), i checked the log and agreed that it should be the 
fence_manual problem. It is because the log message showed that the 
fence_manul was waiting node2 to rejoin the cluster, as soon as i 
executed the command: fence_ack_manual -n node2, the failed services 
failover to node1, all failed service back to normal.

I would like to know if there is any solution or workaround for this 
situation other than buying a fence device :) ????? Can i remove the 
fence.rpm ??? Will it cause any extra problems????? It is because in 
production environment, we never know when will the machine down and 
cannot execute the fence_ack_manual command immediately.

========/var/log/messages======
kernel: CMAN: removing node node2 from the cluster : Missed too many 
heartbeats
fenced[2447]: node2 not a cluster member after 0 sec post_fail_delay
fenced[2447]: fencing node "node2"
fence_manual: Node node2 needs to be reset before recovery can procede.  
Waiting for node2 to rejoin the cluster or for manual acknowledgement 
that it has been reset (i.e. fence_ack_manual -n node2)
=======END================


Regarding the monitor_link issue, i have tried to set the "monitor_link 
=1 " for both resource ip i.e. 192.168.0.111 and 192.168.0.112 , then i 
shutdown eth0 of node2 and re-enable it,  when i tried to restart the 
rgmanager in node2 i.e. the failed node, it still showing the msg 
"Shutting down Cluster Service Manager... Waiting for services to stop: 
", i have to kill the rgmanager's processes or even worse i have to 
reset the machine. Any ideas??

One more thing is even the monitor_link=0 in the cluster.conf, the 
system-config-cluster --> Resource --> IP address's Monitor Link box is 
being ticked!!! Why??

Many thanks,
Dicky




More information about the Linux-cluster mailing list