[Linux-cluster] BladeCenter Fencing errors
Gary Romo
garromo at us.ibm.com
Thu Jan 17 23:51:11 UTC 2008
See below;
Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo at us.ibm.com
Pager:1.877.552.9264
Text message: gromo at skytel.com
jim parsons <jparsons at redhat.com>
Sent by: linux-cluster-bounces at redhat.com
01/17/2008 03:40 PM
Please respond to
linux clustering <linux-cluster at redhat.com>
To
linux clustering <linux-cluster at redhat.com>
cc
linux-cluster-bounces at redhat.com
Subject
Re: [Linux-cluster] BladeCenter Fencing errors
On Thu, 2008-01-17 at 14:06 -0700, Gary Romo wrote:
>
> I enabled telnet on the MM, now I am getting these messsages;
>
> Jan 17 14:00:24 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:00:29 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:00:40 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189
>
> Jan 17 14:00:40 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:00:45 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:00:56 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189
>
> Jan 17 14:00:56 node1 fenced[3229]: fence "node2" failed
> Jan 17 14:01:01 node1 fenced[3229]: fencing node "node2"
> Jan 17 14:01:12 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189
>
> Line 189 looks like this;
>
> ($text, $match) = $t->waitfor("/system:blade\\[$bladenum\\]>/");
>
>
> I am getting these on thesecond node;
>
> Jan 17 14:03:24 mode2 fenced[3340]: fence "node1" failed
> Jan 17 14:03:29 node2 fenced[3340]: fencing node "node1"
> Jan 17 14:03:29 node2 fenced[3340]: fence "node1" failed
> Jan 17 14:03:34 node2 fenced[3340]: fencing node "node1"
> Jan 17 14:03:34 node2 fenced[3340]: fence "node1" failed
>
Ah, yuck. Well, let's figure out what is going on here.
Can you post the clusternodes and fencedevices sections of your
cluster.conf here? Just XXXX out any passwords.
<?xml version="1.0"?>
<cluster alias="rhcs-1-clus" config_version="4" name="rhcs-1-clus">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="node1" votes="1">
<multicast addr="XXX.XXX.127.204"
interface="eth0"/>
<fence>
<method name="1">
<device blade="2"
name="chassis_fence"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" votes="1">
<multicast addr="XXX.XXX.127.204"
interface="eth0"/>
<fence>
<method name="1">
<device blade="3"
name="chassis_fence"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1">
<multicast addr="XXX.XXX.127.204"/>
</cman>
<fencedevices>
<fencedevice agent="fence_bladecenter"
ipaddr="XXX.XXX.1.143" login="rchs_fence" name="chassis_fence"
passwd="XXXXXXX"/>
</fencedevices>
On one of the cluster nodes, can you run
'/sbin/fence_bladecenter -a <ip or hostname of bladecenter> -l <login>
-p <passwd> -n <blade number of another running node> -o status -v'
[root at lxdnt648 ~]# /sbin/fence_bladecenter -a chassis -l rchs_fence -p
XXXXXXX -n 2 -o status -v
Please use '-h' for usage.
Do you know firmware details about your bladecenter? The
fence_bladecenter script hasn't changed in years...The tested firmware
versions are in the top of the file. Maybe the interface has changed. If
so, the debuglog should give us information.
1
chassis
Main application
BRET85M
CNETMNUS.PKT
01-10-07
16
Boot ROM*
BRBR82A
CNETBRUS.PKT
06-01-05
16
Remote control
BRRG85M
CNETRGUS.PKT
01-10-07
16
This will get us started.
-Jim
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080117/bb9272f0/attachment.htm>
More information about the Linux-cluster
mailing list