[Linux-cluster] Cluster Suite 4 failover problem
Jonathan Daniels
jon.daniels at voxsurf.com
Thu Oct 19 15:45:44 UTC 2006
Hi,
What is output to the "/var/log/messages" files of each node? That
should provide a clue as to what the problem is. Also, did you install
the 'fence' RPM and any Clustered LVM / GFS RPMs?
You also might consider rebooting the "downed" node - this function is
generally taken care of by fencing devices automatically and, as I
understand it, "manual fencing" means you gotta reboot :), the
assumption being that a failed node won't be allowed back in the cluster
until it's restarted.
Thanks,
Jon
Dicky wrote:
> Hi All,
>
> I have two machines (named node1 -->192.168.0.27 and node2
> -->192.168.0.28) installed Red Hat Cluster Suite 4 with DLM with 1 NIC
> for each machine. I have created a manual fence, a failover domain,
> two services (1st service is "www - listening address is
> 192.168.0.111" , 2nd service is "ftp - listening address is
> 192.168.0.112).
>
> After having the initital setup, everything seems working fine, i can
> relocate the service from node1 to node 2 or vice versa manually, stop
> and start the services.
>
> But when i tried to test the failover capibility, i.e. shutdown the
> network service in one node e.g. shutdown the eth0 of node1, the
> failed service won't work in most time, following was the scenarios i
> tested:
>
> Scenario: Running services running in node1, then i shutdown the eth0
> of node1
>
> Result: Services not failover to node2, and the clustat in node1 shows
> that:
>
> Member Status: Quorate
>
> Member Name Status
> ------ ---- ------
> node1 Offline
> node2 Online, Local, rgmanager
>
> Service Name Owner (Last) State
> ------- ---- ----- ------ -----
> ftp unkonwn started
> www unkonwn started
>
> Both services were no longer working. when i restarted the eth0 in
> node1, restarted the cman service in node1, it still didn't work.
> Also, when i tried to restart the rgmanager in node1, it only showed
> that "Waiting for services to stop: " and wating forever. Even i tried
> to kill the process of the rgmanager, it didn't work. Finally, i have
> to reset both machines to get the cluster service back to normal.
>
> I would appreciate if anyone could help or anyone can share if they
> also got such experience before.
> I also attached the cluster.conf below for any reference.
>
> ======cluster.conf=========
> <?xml version="1.0"?>
> <cluster config_version="34" name="alpha_cluster">
> <fence_daemon post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="node1" votes="1">
> <fence>
> <method name="1">
> <device name="Fence"
> nodename="node1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="node2" votes="1">
> <fence>
> <method name="1">
> <device name="Fence"
> nodename="node2"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_manual" name="Fence"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="aaa" ordered="0"
> restricted="0">
> <failoverdomainnode name="node1"
> priority="1"/>
> <failoverdomainnode name="node2"
> priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <ip address="192.168.0.111" monitor_link="0"/>
> <script file="/etc/rc.d/init.d/httpd" name="www"/>
> <script file="/etc/rc.d/init.d/vsftpd"
> name="ftp"/>
> <ip address="192.168.0.112" monitor_link="0"/>
> </resources>
> <service autostart="1" domain="aaa" name="ftp"
> recovery="relocate">
> <ip ref="192.168.0.112"/>
> <script ref="ftp"/>
> </service>
> <service autostart="1" domain="aaa" name="www"
> recovery="relocate">
> <ip ref="192.168.0.111"/>
> <script ref="www"/>
> </service>
> </rm>
> </cluster>
> ==========END==========
>
> Many Thanks,
> Dicky
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list