[Linux-cluster] BladeCenter Fencing errors

Gary Romo garromo at us.ibm.com
Thu Jan 17 23:51:11 UTC 2008


See below;

Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo at us.ibm.com
Pager:1.877.552.9264
Text message: gromo at skytel.com



jim parsons <jparsons at redhat.com> 
Sent by: linux-cluster-bounces at redhat.com
01/17/2008 03:40 PM
Please respond to
linux clustering <linux-cluster at redhat.com>


To
linux clustering <linux-cluster at redhat.com>
cc
linux-cluster-bounces at redhat.com
Subject
Re: [Linux-cluster] BladeCenter Fencing errors






On Thu, 2008-01-17 at 14:06 -0700, Gary Romo wrote:
> 
> I enabled telnet on the MM, now I am getting these messsages; 
> 
> Jan 17 14:00:24 node1 fenced[3229]: fence "node2" failed 
> Jan 17 14:00:29 node1 fenced[3229]: fencing node "node2" 
> Jan 17 14:00:40 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189 
> 
> Jan 17 14:00:40 node1 fenced[3229]: fence "node2" failed 
> Jan 17 14:00:45 node1 fenced[3229]: fencing node "node2" 
> Jan 17 14:00:56 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189 
> 
> Jan 17 14:00:56 node1 fenced[3229]: fence "node2" failed 
> Jan 17 14:01:01 node1 fenced[3229]: fencing node "node2" 
> Jan 17 14:01:12 node1 fenced[3229]: agent "fence_bladecenter" reports:
> pattern match timed-out at /sbin/fence_bladecenter line 189 
> 
> Line 189 looks like this; 
> 
>  ($text, $match) = $t->waitfor("/system:blade\\[$bladenum\\]>/"); 
> 
> 
> I am getting these on thesecond node; 
> 
> Jan 17 14:03:24 mode2 fenced[3340]: fence "node1" failed 
> Jan 17 14:03:29 node2 fenced[3340]: fencing node "node1" 
> Jan 17 14:03:29 node2 fenced[3340]: fence "node1" failed 
> Jan 17 14:03:34 node2 fenced[3340]: fencing node "node1" 
> Jan 17 14:03:34 node2 fenced[3340]: fence "node1" failed 
> 
Ah, yuck. Well, let's figure out what is going on here.
Can you post the clusternodes and fencedevices sections of your
cluster.conf here? Just XXXX out any passwords.

<?xml version="1.0"?>
<cluster alias="rhcs-1-clus" config_version="4" name="rhcs-1-clus">
        <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <clusternodes>
                <clusternode name="node1" votes="1">
                        <multicast addr="XXX.XXX.127.204" 
interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device blade="2" 
name="chassis_fence"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2" votes="1">
                        <multicast addr="XXX.XXX.127.204" 
interface="eth0"/>
                        <fence>
                                <method name="1">
                                        <device blade="3" 
name="chassis_fence"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <cman expected_votes="1" two_node="1">
                <multicast addr="XXX.XXX.127.204"/>
        </cman>
        <fencedevices>
                <fencedevice agent="fence_bladecenter" 
ipaddr="XXX.XXX.1.143" login="rchs_fence" name="chassis_fence" 
passwd="XXXXXXX"/>
        </fencedevices>

On one of the cluster nodes, can you run 
'/sbin/fence_bladecenter -a <ip or hostname of bladecenter> -l <login>
-p <passwd> -n <blade number of another running node> -o status -v'

[root at lxdnt648 ~]# /sbin/fence_bladecenter -a chassis -l rchs_fence -p 
XXXXXXX -n 2 -o status -v
Please use '-h' for usage.

Do you know firmware details about your bladecenter? The
fence_bladecenter script hasn't changed in years...The tested firmware
versions are in the top of the file. Maybe the interface has changed. If
so, the debuglog should give us information.


 1  
  chassis  
  Main application  
  BRET85M  
  CNETMNUS.PKT  
  01-10-07  
16
     
     
  Boot ROM*  
  BRBR82A  
  CNETBRUS.PKT  
  06-01-05  
16
     
     
  Remote control  
  BRRG85M  
  CNETRGUS.PKT  
  01-10-07  
16


This will get us started.

-Jim

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080117/bb9272f0/attachment.htm>


More information about the Linux-cluster mailing list