[Linux-cluster] Cluster startup weirdness?

Tue Jun 30 17:32:06 UTC 2009

	I am trying to set up a minimal proof of concept with
RHCS on CentOS 5.3. Three nodes in a cluster (vz1,vz2 vz3),
2 services both just apache defualt page as defined in the
cluster.conf below.

	If I do

service cman start

on vz1 and vz2 they both hang trying to do "fence_tool -w join"

yet clustat and cman_tool status show cluster membership and quorum

no services are running

If I run tcpdump on vz3 I see that initially both vz1  and vz2
send out (from port 5149) to the multicast address but then vz2
stops and only vz1 continues. Is this correct behaviour?

  If I then do

service cman start

on vz3 everything runs (ie. fence_tool doesn' hang), tcpdump on vz3 shows
vz1,vz2 and vz3 doing muliticast and then vz2 and vz3 drop out and only
vz1 continues with multicast. vz3 has taken on the service vz1.  service
vz2 never comes up.

	Ideas? or how do I get service vz1 and vz2 running
with vz3 as a spare failover?

	thanx - steve

	Below is cluster.conf generated by system-config-cluster

<?xml version="1.0" ?>
<cluster config_version="2" name="VPS">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="vz1" nodeid="1" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="vz1_fence"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="vz2" nodeid="2" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="vz2_fence"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="vz3" nodeid="3" votes="1">
			<fence>
				<method name="1">
					<device lanplus="" name="vz3_fence"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman/>
	<fencedevices>
		<fencedevice agent="fence_ipmilan" auth="PASSWORD"
ipaddr="10.254.31.201" login="root" name="vz1_fence" passwd="changeme"/>
<fencedevice agent="fence_ipmilan" auth="PASSWORD"
ipaddr="10.254.31.202" login="root" name="vz2_fence" passwd="changeme"/>
<fencedevice agent="fence_ipmilan" auth="PASSWORD"
ipaddr="10.254.31.203" login="root" name="vz3_fence" passwd="changeme"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="vz1" ordered="1" restricted="1">
				<failoverdomainnode name="vz1" priority="1"/>
				<failoverdomainnode name="vz2" priority="2"/>
				<failoverdomainnode name="vz3" priority="2"/>
			</failoverdomain>
			<failoverdomain name="vz2" ordered="1" restricted="1">
				<failoverdomainnode name="vz1" priority="2"/>
				<failoverdomainnode name="vz2" priority="1"/>
				<failoverdomainnode name="vz3" priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources>
			<ip address="10.254.32.201" monitor_link="1"/>
			<script file="/etc/init.d/httpd" name="vzstart"/>
			<ip address="10.254.32.202" monitor_link="1"/>
		</resources>
		<service autostart="1" domain="vz1" exclusive="1" name="vz1"
recovery="relocate">
			<ip ref="10.254.32.201"/>
			<script ref="vzstart"/>
		</service>
		<service autostart="1" domain="vz2" exclusive="1" name="vz2"
recovery="relocate">
			<ip ref="10.254.32.202"/>
			<script ref="vzstart"/>
		</service>
	</rm>
</cluster>