[Linux-cluster] IP monitor failing periodically

Chris Harms chris at cmiware.com
Sat Jun 30 18:41:03 UTC 2007


I am experiencing periodic failovers due to a floating IP address not 
passing the status check:

clurgmgrd: [9975]: <warning> Failed to ping 192.168.13.204
Jun 30 11:41:47 nodeA clurgmgrd[9975]: <notice> status on ip 
"192.168.13.204" returned 1 (generic error)

Both nodes have bonded NICs with gigabit connections to redundant 
switches, so it is unlikely they are going down, nothing in the logs 
about linux losing the links.  I parked all the cluster services - 2 
Postgres services and 1 Apache - on one node and allowed it to run 
overnight.  There would be no client activity during this time. One 
Postgres service failed two times in this manner and the other failed 
once in this manner.  The Apache service did not fail.

What can I do to resolve this or get more information out of the system 
to resolve this?

Thanks in advance,
Chris




More information about the Linux-cluster mailing list