[Linux-cluster] Fence device, How it work

Wed Nov 9 16:48:18 UTC 2005

On Tue, 2005-11-08 at 16:46 -0800, Michael Will wrote:
> I was more thinking along those lines:
> 
> 1. node A fails
> 2. node B reboots node A
> 3. node A fails again because it has not been fixed.
> 
> now we could have a 2-3-2 loop. worst case situation is
> that 3. is actually
> 3.1 node A comes up and starts reaquiring its ressource
> 3.2 node A fails again because it has not been fixed
> 3.3 goto 2
> 
> Your recommendation f/g is exactly what I was wondering about
> as an alternative. I know it is possible but try to understand
> why it would not be the default behavior.
> 
> In active/passive heartbeat style setups I set the nice-failback
> option so it does not try to reclaim ressources unless the other
> node fails, but I wonder what is the best path in a multinode
> active/active setup.

IMHO, an auto reboot is never a good option.  Theoretically, node A
failed for some reason, and a human should examine it to find out what
the problem is/was.  Recovering a fenced node should require manual
operator intervention--if for no other reason than to verify that a
reboot will not cause a repeat of the incident.

Fencing should a) turn off the fenced node's ability to reacquire
resources; b) power down the fenced node (if possible); and c) alert the
operator that fencing occurred.

> Lon Hohberger wrote:
> > On Tue, 2005-11-08 at 07:52 -0800, Michael Will wrote:
> >   
> >>> Power-cycle. 
> >>>       
> >> I always wondered about this. If the node has a problem, chances are 
> >> that rebooting does not
> >> fix it. Now if the node comes up semi-functional and attempts to regain 
> >> control over the ressource
> >> that it owned before, then that could be bad. Should it not rather be 
> >> shut-down so an human intervention
> >> can fix it before it is being made operational again?
> >>     
> >
> > This is a bit long, but maybe it will clear some things up a little.  As
> > far as a node taking over a resource it thinks it still has after a
> > reboot (without notifying the other nodes of its intentions), that would
> > be a bug the cluster software, and a really *bad* one too!
> >
> > A couple of things to remember when thinking about failures and fencing:
> >
> > (a) Failures are rare.  A decent PC has something like a 99.95% uptime
> > (I wish I knew where I heard/read this long ago) uptime - with no
> > redundancy at all.  A server with ECC RAM, RAID for internal disks, etc.
> > probably has a higher uptime.
> >
> > (b) The hardware component most likely to fail is a hard disk (moving
> > parts).  If that's the root hard disk, the machine probably won't boot
> > again.  If it's the shared RAID set, then the whole cluster will likely
> > have problems.
> >
> > (c) I hate to say this, but the kernel is probably more likely to fail
> > (panic, hang) than any single piece of hardware.
> >
> > (d) Consider this (I think this is an example of what you said?):
> >     1. Node A fails
> >     2. Node B reboots node A
> >     3. Node A correctly boots and rejoins cluster
> >     4. Node A mounts a GFS file system correctly
> >     5. Node A corrupts the GFS file system
> >
> > What is the chance that 5 will happen without data corruption occurring
> > during before 1?  Very slim, but nonzero - which brings me to my next
> > point...
> >
> > (e) Always make backups of critical data, no matter what sort of block
> > device or cluster technology you are using.  A bad RAM chip (e.g. an
> > parity RAM chip missing a double-bit errors) can cause periodic, quiet
> > data corruption.  Chances of this happening are also very slim, but
> > again, nonzero.  Probably at least as likely to happen as (d).
> >
> > (f) If you're worried about (d) and are willing to take the expected
> > uptime hit for a given node when that node fails, even given (c), you
> > can always change the cluster configuration to turn "off" a node instead
> > of reboot it. :)
> >
> > (g) You can chkconfig --del the cluster components so that they don't
> > automatically start on reboot; same effect as (f): the node won't
> > reacquire the resources if it never rejoins the cluster...
> >
> >
> >   
> >> I/O fencing instead of power fencing kind of works like this, you undo 
> >> the i/o block once you know
> >> the node is fine again.
> >>     
> >
> > Typically, we refer to that as "fabric level fencing" vs. "power level
> > fencing", both fit in with the I/O fencing paradigm in preventing a node
> > from flushing buffers after it has misbehaved.
> >
> > Note that typically the only way to be 100% positive a node has no
> > buffers waiting after it has been fenced at the fabric level is a hard
> > reboot.
> >
> > Many administrators will reboot a failed node as a first attempt to fix
> > it anyway - so we're just saving them a step :)  (Again, if you want,
> > you can always do (f) or (g) above...)
> >
> > -- Lon
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >   
> 
> 
----------------------------------------------------------------------
- Rick Stevens, Senior Systems Engineer     rstevens at vitalstream.com -
- VitalStream, Inc.                       http://www.vitalstream.com -
-                                                                    -
-    "Hello. My PID is Inigo Montoya.  You `kill -9'-ed my parent    -
-                     process.  Prepare to vi."                      -
----------------------------------------------------------------------