[Linux-cluster] Testing Failover - Failing in few cases

Wed May 30 15:20:08 UTC 2007

use your up2date or download the srpms off the updates site and recompile/install them.. ;)

Jason

On Wed, May 30, 2007 
at 
11:15:13AM -0400, James McOrmond wrote:
> Has the cluster suite finally been released for RHEL4u5 at this point? 
> I'm still not seeing it as an ISO download on RHN.
> 
> rhurst at bidmc.harvard.edu wrote:
> >We second that motion -- skip U4 altogether, go directly to U5.
> >
> >
> >On Wed, 2007-05-30 at 16:55 +0200, Hagmann, Michael wrote:
> >
> >>Hi 
> >
> >>  
> >
> >>First of all when you really have RHEL4 update4, then you should 
> >>update to RHEL4 update5 befor you go into more testing. 
> >
> >>  
> >
> >>There are a lot of bugs in RHEL4 CS Update 4 ! 
> >
> >>  
> >
> >>Mike 
> >
> >>  
> >
> >>
> >>------------------------------------------------------------------------
> >>
> >>*From:* linux-cluster-bounces at redhat.com 
> >>[mailto:linux-cluster-bounces at redhat.com] *On Behalf Of *Satya Daragani
> >>*Sent:* Montag, 28. Mai 2007 15:12
> >>*To:* Linux-cluster at redhat.com
> >>*Subject:* [Linux-cluster] Testing Failover - Failing in few cases
> >>
> >>
> >>
> >>Hi Linux-Cluster Team, 
> >
> >>  
> >
> >>Please help me in testing the failover with the RHEL Cluster Suite 4 
> >>with update 4. I am appending the details related to cluster nodes and 
> >>configuration here. Kindly suggest me how to proceed further. 
> >
> >>  
> >
> >>IBM Lenovo Thinkcentre with AMD Opteron 64bit processor - Two nodes
> >>
> >>256 MB RAM
> >>
> >>One NIC
> >>
> >> 
> >>
> >>   1. Installed RHEL AS 4 Update 4 on both the nodes
> >>   2. Configured NIC with IP range 192.168.1.x (node1 ??? 192.168.1.1
> >>      <http://192.168.1.1> , node2 ??? 192.168.1.2 <http://192.168.1.2>)
> >>   3. Configured /etc/hosts.
> >>   4. Installed the RHEL cluster suite 4 update 4 on both nodes.
> >>   5. Added both the nodes in the cluster manager with one quorum vote
> >>   6. No fence devices configured (chkconfig --del fenced)
> >>   7. Restricted & ordered by priority (node1 ??? 1, node -2) level
> >>      failover domain configured.
> >>   8. Shared IP address (192.168.1.5 <http://192.168.1.5>) resource is
> >>      configured and enabled the monitor link option.
> >>   9. Created a service with the name httpd and configured the following
> >>         1. Checked the Autostart this service
> >>         2. Selected the failover domain configured in the previous
> >>            steps.
> >>         3. Selected the Relocate as the recovery policy
> >>         4. Added the shared resource (IP created in the above steps),
> >>            under this shared resource added the private resource
> >>            script(/etc/rc.d/init.d/httpd). 
> >>
> >>
> >> 
> >>
> >>Checking the failover:
> >>
> >>1st case
> >>
> >>After configuring the above, now node1 is the primary node for the 
> >>httpd service.
> >>
> >>If I restart the node1 the service is failed over to the node2, and 
> >>once the node1 comes up again the service is failing over to the node1 
> >>(as the priority is configured)
> >>
> >> 
> >>
> >>2nd case
> >>
> >>Currently node1 is running the httpd service, if I down the network 
> >>interface (ifconfig eth0 down), the httpd service is failing over to 
> >>the node2.
> >>
> >>Then if I up the interface (ifconfig eth0 up) on node1, the service is 
> >>not failovering to the node1 and in the /var/log/messages it is saying 
> >>"unable to contact the cluster infrastructure". *Need your help here*
> >>
> >> 
> >>
> >>If I restart the cluster services on the node1 again the service is 
> >>getting started on the node1.
> >>
> >> 
> >>
> >>3rd case
> >>
> >>Currently node1 is running the HTTPd service, if I remove the 
> >>powercord (I mean the improper shutdown), the service is going to the 
> >>recovery mode and not getting started on the node2. *Need your help here.*
> >>
> >> 
> >>
> >>4th case
> >>
> >>Currently node1 is running the httpd service, if I stop or killall the 
> >>httpd service (service httpd stop) failover is not happening. *Need 
> >>your help here.*
> >>
> >> 
> >>
> >>-- 
> >>Thanx
> >>Satya Daragani
> >>satya.daragani at gmail.com <mailto:satya.daragani at gmail.com>
> >>+91 98850 58366 
> >
> >>--
> >>Linux-cluster mailing list
> >>Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
> >>https://www.redhat.com/mailman/listinfo/linux-cluster
> >>
> >*Robert Hurst, Sr. Cach?? Administrator*
> >*Beth Israel Deaconess Medical Center*
> >*1135 Tremont Street, REN-7*
> >*Boston, Massachusetts   02120-2140*
> >*617-754-8754 ??? Fax: 617-754-8730 ??? Cell: 401-787-3154*
> >Any technology distinguishable from magic is insufficiently advanced.
> >
> >
> >------------------------------------------------------------------------
> >
> >--
> >Linux-cluster mailing list
> >Linux-cluster at redhat.com
> >https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> -- 
> James A. McOrmond (jamesm at xandros.com)
> Hardware QA Lead & Network Administrator
> Xandros Corporation, Ottawa, Canada.
> Morpheus: ...after a century of war I remember that which matters most:
>  *We are still HERE!*
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

-- 
================================================
|    Jason Welsh   jason at monsterjam.org        |
| http://monsterjam.org    DSS PGP: 0x5E30CC98 |
|    gpg key: http://monsterjam.org/gpg/       |
================================================