[Linux-cluster] nodes halted with net lost

ESGLinux esggrupos at gmail.com
Tue Apr 28 17:07:17 UTC 2009


Hello Gordan,

Fencing works fine when the switcher is ok. I can fence a node from the
other, and when the communication is lost the cluster fence the node that
needs to fence. (I have a funny issue when I start the firewall and the
communication was lost, the nodes fences each other...)

You say its normal that the nodes halt until the switch comes up, but I
prefer that they reboots. I use fence_ipmilan agent to fence
but I dont know how to configure to do what I want

any idea?

thanks

ESG





2009/4/28 Gordan Bobic <gordan at bobich.net>

> On Tue, 28 Apr 2009 17:21:13 +0200, ESGLinux <esggrupos at gmail.com> wrote:
>
> > The nodes are connected through a single switcher (I know, this is a
> single
> > point of failure...). If I reboot the switcher, the two nodes halt.
> > (through
> > fencing it can be done because the go through the same switcher)
>
> If they can't fence each other, cluster services will pause until fencing
> can
> be performed and verified. If this isn't happening (because the only path
> between them with also covers fencing, is gone), then the behaviour you are
> seeing is expected. But when the switch comes back up, they should resume.
>
> If they don't resume when the switch comes back up, then that sounds like a
> fencing configuration issue. Have you verified that fencing works and that
> each node can successfully fence the other?
>
> It is normally a good idea to isolate the cluster communication to a
> dedicated interface. If you only have 2 nodes, you could just connect them
> directly on a dedicated interface, without a switch.
>
> > I don´t know if this behaviour is normal and if its possible to control
> > it.
> > I want that when this happens the nodes dont do nothing  or at least they
> > reboot, not halt.
>
> You can configure the action you want the fencing agent to perform. Look
> up the man page for the fencing agent you are using. I thought the default
> was to reboot (at least it is for the DRAC agent).
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20090428/2af00edb/attachment.htm>


More information about the Linux-cluster mailing list