[Linux-cluster] Problem with fenced on cluster with 2 BladeCentermachines: 1st machine is remove physically. The remaining one doesnot became Active (waiting for fenced)

Thistle, Scott Sthistle at gov.nl.ca
Thu Jul 12 15:13:52 UTC 2007


I am having the same issue. If a blade is not present (i.e. removed for
maintenance), the fence_bladecenter cannot check the state as it is
reported empty. I think it is something simple to fix for those versed
in perl. Normally the fence only runs against a blade that is present.
If the blade is removed while running, you run into this issue.

My case below. Blade #3 is a good node. Blade #2 was removed. The fence
does not work with the blade removed.

system> env -T system:blade[3]
OK
system:blade[3]> power -state
On
system:blade[3]> env -T system:blade[2]
The target bay is empty. 
system:blade[3]> env -T system:blade[1]
OK
system:blade[1]>

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of James Parsons
Sent: Thursday, July 12, 2007 12:33 PM
To: linux clustering
Subject: Re: [Linux-cluster] Problem with fenced on cluster with 2
BladeCentermachines: 1st machine is remove physically. The remaining one
doesnot became Active (waiting for fenced)

catalin.lupescu at bull.net wrote:

>
> Hello!
>
> I have a Cluster Redhat made with 2 nodes IBM blades on Blade Center 
> chassis.
> (fenced version 1.32.6)
>
> I have done the following test:
> I have removed physically the node 1 machine (the Active one).
> The second one is never became active one. "Clustat" command does not 
> printing any information.
> In /var/log/messages we can found the following messages (repeated):
>
> Jul 11 17:46:24 cdrc1-2 fenced[4214]: fencing node "cdrc1-1"
> Jul 11 17:46:38 cdrc1-2 fenced[4214]: agent "fence_bladecenter" 
> reports: pattern match timed-out at /sbin/fence_bladecenter line 185 
> Jul 11 17:46:38 cdrc1-2 fenced[4214]: fence "cdrc1-1" failed
>
> If the node 1 is plugged, the node 2 became Active (fenced OK)
>
bz#240509 changed the sleep timeout in the bladecenter agent from 5 to
10...this is on or about line 193 in /sbin/fence_bladecenter.  See what
yours is set at, and try pushing it out a bit. This minor change is
making its way through the distribution chain now.

-j

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list