[Linux-cluster] Active-Active configuration of arbitrary services
Lon Hohberger
lhh at redhat.com
Fri Oct 19 19:16:01 UTC 2007
On Fri, 2007-10-19 at 14:53 +0000, Glenn Aycock wrote:
> We are running RHCS on RHEL 4.5 and have a basic 2-node HA cluster
> configuration for a critical application in place and functional. The
> config looks like this:
> <?xml version="1.0"?>
> <cluster config_version="16" name="routing_cluster">
> <fence_daemon post_fail_delay="0" post_join_delay="10"/>
> <clusternodes>
> <clusternode name="host1" votes="1">
> <fence>
> <method name="1">
> <device name="manual" nodename="host1"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="host2" votes="1">
> <fence>
> <method name="1">
> <device name="manual" nodename="host2"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman dead_node_timeout="10" expected_votes="1" two_node="1"/>
> <fencedevices>
> <fencedevice agent="fence_manual" name="manual"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="routing_servers" ordered="1" restricted="1">
> <failoverdomainnode name="host1" priority="1"/>
> <failoverdomainnode name="host2" priority="2"/>
> </failoverdomain>
> </failoverdomains>
> <resources>
> <script file="/etc/init.d/rsd" name="rsd"/>
> <ip address="123.456.78.9" monitor_link="1"/>
> </resources>
> <service autostart="1" domain="routing_servers" name="routing_daemon" recovery="relocate">
> <ip ref="123.456.78.9"/>
> <script ref="rsd"/>
> </service>
> </rm>
> </cluster>
> The cluster takes about 15-20 seconds to notice that the daemon is
> down and migrate it to the other node. However, due to slow migration
> and startup time, we now require the daemon on the secondary to be
> active and only transfer the VIP in case it aborts on the primary.
You could start by decreasing the 'status check' time by
tweaking /usr/share/cluster/script.sh "status" interval:
<action name="status" interval="30s" timeout="0"/>
<action name="monitor" interval="30s" timeout="0"/>
Change to:
<action name="status" interval="10s" timeout="0"/>
<action name="monitor" interval="10s" timeout="0"/>
(as an example...)
You can also make a wrapper script which doesn't do the stop phase of
your rsd script unless it's already in a non-working state (to prevent
stop-before-start that rgmanager normally does):
#!/bin/bash
SCR=/etc/init.d/rsd
case $1 in
start)
# Should be a no-op if already running
$SCR start
exit $?
;;
stop)
# Don't actually stop it if it's running; just
# clean it up if it's broken. This app is
# safe to run on multiple nodes
$SCR status
if [ $? -ne 0 ]; then
$SCR stop
exit $?
fi
exit 0
;;
status)
$SCR status
exit $?
;;
esac
exit 0
(Note: rsd will have to be enabled on boot for this to work).
-- Lon
More information about the Linux-cluster
mailing list