[Linux-cluster] [UPDATE] IP monitor failing periodically
Chris Harms
chris at cmiware.com
Sat Jul 21 20:41:04 UTC 2007
We reinstalled our machines with RHEL 5 x86_64 (we were running i386) a
few weeks ago and the mysterious IP monitoring failures have disappeared.
I believe it was postulated that a compiler bug regarding -fpie might be
causing segfaults in i386 binaries, so this would support that theory to
some degree, although I did not really attempt to confirm it further. I
thought the architecture change fixing the random failovers was noteworthy.
### previous thread below
Hi Chris,
I am experiencing the same problem on RHEL 5 and I have a support
request in with RedHat.
I was asked to increase the debug level by changing the <rm> line in the
cluster configuration to:
<rm log_facility="local4" log_level="7">
I then needed to add "local4.* /var/log/cluster" to /etc/syslog.conf and
run "service syslog restart".
To update the cluster configuration I needed to propagate the cluster
configuration to both nodes:
# ccs_tool update /etc/cluster/cluster.conf
After a week I have not had the problem with the increased logging
despite the problem occurring regularly prior to that - 2 to 3 times a
day. One day last week out of curiosity I reverted to the default
settings and within a few hours I had the failure to ping error on one
of the clustered IP addresses and the service was restarted.
I now have the logging back at 7 and the support request is pending.
Regards
--
David Schroeder
Server Support
Information Services Division
Flinders University
Adelaide, Australia
Ph: +61 8 8201 2689
Chris Harms wrote:
> I am experiencing periodic failovers due to a floating IP address not
> passing the status check:
>
> clurgmgrd: [9975]: <warning> Failed to ping 192.168.13.204
> Jun 30 11:41:47 nodeA clurgmgrd[9975]: <notice> status on ip
> "192.168.13.204" returned 1 (generic error)
>
> Both nodes have bonded NICs with gigabit connections to redundant
> switches, so it is unlikely they are going down, nothing in the logs
> about linux losing the links. I parked all the cluster services - 2
> Postgres services and 1 Apache - on one node and allowed it to run
> overnight. There would be no client activity during this time. One
> Postgres service failed two times in this manner and the other failed
> once in this manner. The Apache service did not fail.
>
> What can I do to resolve this or get more information out of the system
> to resolve this?
>
> Thanks in advance,
> Chris
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list