[Linux-cluster] proper cluster crash procedures?
Mark Chaney
macscr at macscr.com
Mon Sep 29 08:16:08 UTC 2008
Here is my cluster.conf
#########################################
<?xml version="1.0"?>
<cluster alias="myiacon" config_version="16" name="myiacon">
<fence_daemon clean_start="0" post_fail_delay="0"
post_join_delay="60"/>
<clusternodes>
<clusternode name="ratchet.local" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ratchet_ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="skydive.local" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="skydive_ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="wheeljack.local" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="wheeljack_ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.1.100"
login="root" name="ratchet_ipmi" passwd="xxxxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.1.102"
login="root" name="skydive_ipmi" passwd="xxxxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="192.168.1.101"
login="root" name="wheeljack_ipmi" passwd="xxxxxx"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>
#############################################
And here is one of the errors I just started getting:
Sep 29 08:10:06 wheeljack openais[5453]: [MAIN ] Killing node ratchet.local
beca use it has rejoined the cluster with existing state
But half the time, servers just complain that they cant reconnect to the
cluster.
-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Mark Chaney
Sent: Monday, September 29, 2008 3:07 AM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] proper cluster crash procedures?
I have a 3 node cluster that has shared storage using iscsi san, hence I am
using GFS. Anyway, I had it crash for whatever reason, not sure if something
was rebooted incorrectly or what, but now I have been spending the past 2
hours trying to get the cluster back up. I would think that sampling
rebooting all the nodes would work, but heck, that hasn't. What should I be
doing? Should I just start up one at a time? BTW, I am using ipmi for
fencing if that makes a difference. I can post my cluster.conf if that's
helpful, but I would think there would be general techniques available.
Thanks,
Mark
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
More information about the Linux-cluster
mailing list