[Linux-cluster] Cluster reboot problems
Johann Peyrard
peyrardj at bull.net
Wed Jan 28 13:56:45 UTC 2009
I have found this on cluster-2.03.11/doc/usage.txt :
- To avoid unnecessary fencing when starting the cluster, it's best for
all nodes to join the cluster (complete cman_tool join) before any
of them do fence_tool join.
I think something should be fix to resolve this issue.
It is a real problem on a "production" system.
When the fencing domain is closed node (after a fence_tool join), node could not enter the cluster.
You have to do at the same time on all nodes :
#cman_tool join
#fence_tool join
Strange behaviour... i have this problem on RHEL 5.3.
On Fri, Jan 23, 2009 at 02:32:25PM +0000, Mark Watts wrote:
>
> Hi,
>
> I've got a 3-node RHEL 5.3 cluster. I'm running the cluster nodes as XEN Dom0
> domains so I can deploy DomU domains as vm services within the cluster.
> Hardware is:
>
> 3 x Dell PowerEdge 1855 blades
> 2 x Dell PowerConnect 5316M Ethernet modules (for eth0 and eth1)
>
> I have a 4th blade acting as an iSCSI target, exporting a 2GB and two 20GB
> targets. The 2GB target is used as /etc/xen/ on the cluster nodes, mounted as
> a _netdev mount in /etc/fstab on the cluster nodes (mounted on /xen, with
> symlinks from /etc/xen to /xen/xen).
> All network traffic uses the same switch module, since I'm only using eth0 at
> this time.
>
> To install the nodes, I'm kickstarting from a Satellite, and doing a "yum
> update" followed by a reboot to get to RHEL 5.3.
> I also deploy the same cluster.conf to each node (appended to this email).
> I then bring up cman, rgmanager. clvmd and gfs on all nodes (using the "Send
> input to all sessions" feature of Konsole to start the services at the same
> time on all nodes). This brings up the cluster, and allows me to mount the
> iSCSI target for /xen.
> Starting xend allows me to enable the vm service listed in cluster.conf
> (clusvcadm -e vm:node1)
> Oh, I also log *.* to a syslog server so I can see all the logs in one place.
>
> Nodes are:
> c1.eris.qinetiq.com
> c2.eris.qinetiq.com
> c3.eris.qinetiq.com
>
> "So far so good", I think.
>
> So, I enable cman, rgmanager, clvmd, gfs and xend to start on boot and reboot
> the cluster (all three nodes at the same time)
>
> At which point everything starts to fall apart.
>
> As the nodes come up and try and create a cluster, nodes c1 and c2 appear to
> form a cluster, and then fence node c3 when it joins.
>
> When node c3 comes back up and tries to join the cluster, node c1 decides the
> cluster is no-longer quorate, and fences node c2.
> When node c2 comes back up and tries to join the cluster, node c1 decides the
> cluster is no-longer quorate, and fences node c3.
>
> This then continues for as long as I'm entertained watching the logs, and
> switch off all three servers.
>
>
> Does anyone have any insight as to what the difference is between starting the
> cluster services manually, and starting them at boot is, and why that
> difference (because I can't think of any other difference between the two
> states) would cause me to never gain a stable cluster?
>
> I'm at a bit of a loss really - I moved from a 2-node cluster to a 3-node one
> to try and avoid exactly these problems.
> I've also had the same problem with a CentOS 5.2 cluster on the same
> hardware - in that case the nodes were still fencing each other the following
> morning, 18 hours later!
>
>
> Regards,
>
> Mark.
>
> --
> Mark Watts BSc RHCE MBCS
> Senior Systems Engineer
> QinetiQ Applied Technologies
> GPG Key: http://www.linux-corner.info/mwatts.gpg
> <?xml version="1.0"?>
> <cluster alias="WebFarmTest" config_version="1" name="WebFarmTest">
> <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
> <clusternodes>
> <clusternode name="c1.eris.qinetiq.com" nodeid="1" votes="1">
> <fence>
> <method name="1">
> <device name="DRACMC" modulename="Server-1" action="Off"/>
> <device name="DRACMC" modulename="Server-1" action="On"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="c2.eris.qinetiq.com" nodeid="2" votes="1">
> <fence>
> <method name="1">
> <device name="DRACMC" modulename="Server-2" action="Off"/>
> <device name="DRACMC" modulename="Server-2" action="On"/>
> </method>
> </fence>
> </clusternode>
> <clusternode name="c3.eris.qinetiq.com" nodeid="3" votes="1">
> <fence>
> <method name="1">
> <device name="DRACMC" modulename="Server-3" action="Off"/>
> <device name="DRACMC" modulename="Server-3" action="On"/>
> </method>
> </fence>
> </clusternode>
> </clusternodes>
> <cman expected_votes="2"/>
> <fencedevices>
> <fencedevice agent="fence_drac" ipaddr="XXX" login="XXX" name="DRACMC" passwd="XXX"/>
> </fencedevices>
> <rm>
> <failoverdomains>
> <failoverdomain name="webfarm-fd" nofailback="0" ordered="0" restricted="1">
> <failoverdomainnode name="c1.eris.qinetiq.com" priority="1"/>
> <failoverdomainnode name="c2.eris.qinetiq.com" priority="1"/>
> <failoverdomainnode name="c3.eris.qinetiq.com" priority="1"/>
> </failoverdomain>
> </failoverdomains>
> <resources/>
> <vm autostart="1" domain="webfarm-fd" exclusive="1" migrate="live" name="node1" path="/etc/xen/" recovery="relocate"/>
> </rm>
> </cluster>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list