[Linux-cluster] Testing Failover - Failing in few cases
Lon Hohberger
lhh at redhat.com
Tue Jun 12 13:46:18 UTC 2007
On Mon, May 28, 2007 at 06:42:08PM +0530, Satya Daragani wrote:
> Hi Linux-Cluster Team,
>
> Please help me in testing the failover with the RHEL Cluster Suite 4 with
> update 4. I am appending the details related to cluster nodes and
> configuration here. Kindly suggest me how to proceed further.
>
> 6. No fence devices configured (chkconfig --del fenced)
This breaks 2-node mode.
> 2nd case
>
> Currently node1 is running the httpd service, if I down the network
> interface (ifconfig eth0 down), the httpd service is failing over to the
> node2.
>
> Then if I up the interface (ifconfig eth0 up) on node1, the service is not
> failovering to the node1 and in the /var/log/messages it is saying "unable
> to contact the cluster infrastructure". *Need your help here*
CMAN doesn't gracefully die in this case; fencing needs to happen to fix
it.
> 3rd case
>
> Currently node1 is running the HTTPd service, if I remove the powercord (I
> mean the improper shutdown), the service is going to the recovery mode and
> not getting started on the node2. *Need your help here.*
Are there any log messages here? I'm not sure if lack of fencing might
cause this; I suspect not. It shouldn't stick in recovery mode.
> Currently node1 is running the httpd service, if I stop or killall the httpd
> service (service httpd stop) failover is not happening. *Need your help
> here.*
That's a bug in the script resource agent. Edit
/usr/share/cluster/script.sh and change the 'interval' values for the
monitor + status actions to 10 seconds instead of 3600 seconds / 1 hour.
e.g.
interval="10"
That should fix it.
Can I convince you to upgrade at least:
rgmanager
ccs
magma
magma-plugins
...to the update 5 releases?
-- Lon
--
Lon Hohberger - Software Engineer - Red Hat, Inc.
More information about the Linux-cluster
mailing list