[Linux-cluster] Cluster service restarting

Fri Jun 22 20:40:43 UTC 2007

On Thu, Jun 21, 2007 at 09:28:48AM +0930, David Schroeder wrote:
> Hi,
> 
> We have been running web and database clusters successfully for several 
> years on RHEL 3 and 4 and we now have one of each on RHEL 5.
> 
> The setup is very straight forward, 2 nodes active/active with one 
> running the webserver the other the databases.
> 
> We have found the services restart in place regularly, up to 2 or 3 
> times a day sometimes. The cause is the Failure to ping one or another 
> of the clustered service IP addresses and is evident from the log 
> entries. This happens less frequently on the database server with one 
> clustered interface than it does with the webserver that has 5. The 
> failure to ping that is reported in the logs for the webserver is not 
> always on the same IP address and it seems quite random in time and 
> which in which IP address it reports is at fault. There are no load 
> related issues as this is still in the testing stage.
> 
> I have turned the "Monitor Link" setting off and it still happens.
> 
> Are there any settings that will increase the timeout as I'm sure the 
> interface does not go down.
> 
> Any other pointers or suggestions?

You can disable the check; remove these from /usr/share/cluster/ip.sh:

        <!-- Checks to see if we can ping the IP address locally -->
        <action name="status" depth="10" interval="60" timeout="20"/>
        <action name="monitor" depth="10" interval="60" timeout="20"/>

Update your /etc/cluster/cluster.conf's config_version and redistribute
the configuration file using ccs_tool update.  This will cause rgmanager
to stop doing the 'ping' checks.

-- Lon

-- 
Lon Hohberger - Software Engineer - Red Hat, Inc.