[Linux-cluster] Problems with Cluster
Marc Grimme
grimme at atix.de
Tue Jun 12 06:16:30 UTC 2007
On Tuesday 12 June 2007 03:29:00 Manish Kathuria wrote:
> On 6/11/07, Robert Gil <Robert.Gil at americanhm.com> wrote:
> > If ilo itself is off, fencing doesn't work.
>
> Isn't there any timeout setting such that if the ILO doesn't respond
> for a certain amount of time, it is treated as fenced and the node is
> considered to be dead and the failover takes place?
As far as I remember there is only a tcp-timeout when establishing the
connection to the ilo-card that takes a very long time to occure (that's a
default setting and takes minutes). I'm not sure how and where to set it.
But we've had this discussion (especially with ILO-Cards) nearly every time
when using them and therefore and also out of other reasons we had to build
our own fence_ilo agent. I'm quite sure that we solved the timeout problem in
the end. It is set to 10sec per default (Config.timeout).
You can find it at
http://download.atix.de/yum/comoonics/productive/noarch/RPMS/comoonics-bootimage-fenceclient-ilo-0.1-16.noarch.rpm
or directly use the yum/up2date-channel as described here:
http://www.open-sharedroot.org/faq/can-i-use-yum-or-up2date-to-install-the-software/
then install "comoonics-bootimage-fenceclient-ilo" and there you go.
>
> > Did you add ilo as a fence device? And create a user? You create a user
> > in the ilo for that blade, not on the chassis. You have to reboot the
> > blade to get to the ilo manager.
>
> Yes, had added respective ILOs as fence devices for both the servers
> and created users also.
We are doing so as well. Always a power user for ilo devices.
We are also automating this with the ilo client.
There is a undocumented switch -x in the fence_ilo client referenced above
where you reference a file that might look as follows and you'll have your
user.
<USER_INFO MODE="write">
<ADD_USER
USER_NAME="power"
USER_LOGIN="power"
PASSWORD="the_password">
<ADMIN_PRIV value ="N"/>
<REMOTE_CONS_PRIV value ="N"/>
<RESET_SERVER_PRIV value ="Y"/>
<VIRTUAL_MEDIA_PRIV value ="N"/>
<!-- Firmware support infomation for next tag: -->
<!-- iLO 2 - All version. -->
<!-- iLO - All version. -->
<!-- RILOE II - None -->
<CONFIG_ILO_PRIV value="Yes"/>
<!-- Firmware support infomation for next 3 tags: -->
<!-- iLO 2 - None. -->
<!-- iLO - None. -->
<!-- RILOE II - All versions. -->
<!--
<CONFIG_RILO_PRIV value="Y"/>
<LOGIN_PRIV value ="Y"/>
<CLIENT_RANGE value ="10.10.10.1 - 254.255.255.255"/>
-->
<!-- Firmware support infomation for next 6 tags: -->
<!-- iLO 2 - None. -->
<!-- iLO - Version 1.40 and earlier. -->
<!-- RILOE II - None. -->
<!--
<VIEW_LOGS_PRIV value="Yes"/>
<CLEAR_LOGS_PRIV value="Yes"/>
<EMS_PRIV value="Yes"/>
<UPDATE_ILO_PRIV value="No"/>
<CONFIG_RACK_PRIV value="Yes"/>
<DIAG_PRIV value="Yes"/>
-->
</ADD_USER>
</USER_INFO>
>
>
> I just want to make sure that automatic fencing happens and failover
> takes place even when there is a complete power failure for one node
If the timeout thing works you'll also need a second fence mechanism.
You might think about using fence_manual as last resort, to bring that cluster
back online after power failure and then after manual intervention.
Regards Marc.
>
> > -----Original Message-----
> > From: linux-cluster-bounces at redhat.com
> > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Manish Kathuria
> > Sent: Monday, June 11, 2007 12:45 PM
> > To: linux clustering
> > Subject: Re: [Linux-cluster] Problems with Cluster
> >
> > On 6/11/07, Maciej Bogucki <maciej.bogucki at artegence.com> wrote:
> > > Manish Kathuria napisał(a):
> > > > We want the failover to happen when the power supply fails to either
> > > > of the nodes. In order to test the scenario, we removed the power
> > > > cables from one of the nodes. However the failover did not happen
> > > > and upon observing the logs we found that the alive node could not
> > > > connect to the fence device (ILO in this case) of the dead node
> > > > since it was powered off and the fencing could not take place. Does
> > > > this mean that we would not be able to have a failover in case of
> > > > power failure for one of the nodes. Is there a way we can do it ?
> > > > How is the cluster supposed to react when the ILO itself is powered
> > > > off ?
> > >
> > > You need to perform manual fencing(administrator reaction) when it
> > > happend.
> >
> > Isn't there any way which is automated and does not require manual
> > intervention ? Otherwise, the whole purpose gets defeated.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
--
Gruss / Regards,
Marc Grimme
Phone: +49-89 452 3538-14
http://www.atix.de/ http://www.open-sharedroot.org/
**
ATIX - Ges. fuer Informationstechnologie und Consulting mbH
Einsteinstr. 10 - 85716 Unterschleissheim - Germany
Registergericht: Amtsgericht München
Registernummer: HRB 131682
USt.-Id.: DE209485962
Geschäftsführung: Marc Grimme, Mark Hlawatschek, Thomas Merz
More information about the Linux-cluster
mailing list