[Linux-cluster] Re: More CS4 fencing fun
Lon Hohberger
lhh at redhat.com
Mon Mar 27 20:41:57 UTC 2006
On Fri, 2006-03-24 at 11:06 +0100, Matteo Catanese wrote:
> Hi Lon,
> you mail is "music" for my ears :D
>
> I will try your /sbin/fence_dontcare immediately.
Best wishes! If it breaks, all of the pieces are yours to keep.
> i dont want to be interrupted in weekends when i play my
> favourite video game (WOW) just because ONE component broke and all
> cluster hung :-)
Great game.
> Sure our hardware configuration can sustain also some multi-point
> failure, but NSPOF is our mail goal
Remember that a redundant remote power switch doesn't obviate the need
for iLO. iLO is *much* more than a power button. It has remote console
abilities and other management stuff -- all which is very useful for
system administration and maintenance.
In my opinion, the power-button feature of iLO is the *least* useful
part.
> In my case WTI should be useful only in case of multiple failure, for
> example both network switch fails so heartbeat fails and ilo fails
> too and with /sbin/fence_dontcare i will have corruption. Is this
> correct ?
With the dontcare hack, you can have corruption if the node stops
heartbeating (for any reason) and iLO does not respond at the time
fence_ilo is called.
Examples - Live-hang of the node with the iLO disconnected, too much
system load to get out heartbeats, network congestion/saturation, bad
cables, routing problems, internal problem in the switch, ARP storms,
power surges, iLO bugs/failure, too many people logged in to iLO, etc.
I do not know all of the possible the failure case(s). That is why the
last cluster I set up has a remote power controller, even though all of
the nodes individually have iLO as well. Call me paranoid if you want,
but please, think about these two points:
(1) Uptime with corrupt data does not equal availability
... and, more importantly ...
(2) It *really* sucks to have to restore from backup when you could be
playing WoW...
> I will need a supplemental NIC for every server to connect to WTI,
Actually, it should be on the same network as the cluster uses for
communications, especially in two-node CMAN/DLM clusters; check out:
http://people.redhat.com/teigland.sca.pdf
-- Lon
More information about the Linux-cluster
mailing list