[Linux-cluster] IP monitor failing periodically
Lon Hohberger
lhh at redhat.com
Fri Jul 6 16:34:49 UTC 2007
On Sat, Jun 30, 2007 at 01:41:03PM -0500, Chris Harms wrote:
> I am experiencing periodic failovers due to a floating IP address not
> passing the status check:
>
> clurgmgrd: [9975]: <warning> Failed to ping 192.168.13.204
> Jun 30 11:41:47 nodeA clurgmgrd[9975]: <notice> status on ip
> "192.168.13.204" returned 1 (generic error)
>
> Both nodes have bonded NICs with gigabit connections to redundant
> switches, so it is unlikely they are going down, nothing in the logs
> about linux losing the links. I parked all the cluster services - 2
> Postgres services and 1 Apache - on one node and allowed it to run
> overnight. There would be no client activity during this time. One
> Postgres service failed two times in this manner and the other failed
> once in this manner. The Apache service did not fail.
>
> What can I do to resolve this or get more information out of the system
> to resolve this?
Hmm, with bonded NICs, ip.sh monitors the links of the physical devices.
It's supposed to check and not complain if either link is up.
The ping bit is a bit weird; you could just disable it in
/usr/share/cluster/ip.sh.
I.e. change the 'ping' line to '/bin/true'
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster
mailing list