[Linux-cluster] Re: Using qdisk and a watchdog timer to eliminate power fencing single point of failure?

Mon Feb 25 18:00:46 UTC 2008

On Sat, 2008-02-23 at 09:06 -0800, Jonathan Biggar wrote:

> > Hardware watchdog timers are going to be more reliable than just about
> > anything qdiskd could provide.
> 
> Ok, I get it.  It's probably a couple of orders of magnitude more 
> reliable, but since it relies only on timing, there's no real *positive* 
> indication that the fencing succeeded, so it's really only best-effort. 
>   Even though it would take three failures (network disruption of 
> heartbeat, quorumd failing to reboot the node and the watchdog timer 
> failing as well), there's still a slim, slim chance that the node is 
> still trying to write to the SAN.  If I want to guarantee that there's 
> never a split brain, then this isn't good enough.

Correct.  Although, plenty of software relies on hardware watchdog
timers.

While it's not externally verifiable, I would estimate the chances of a
WDT failing when configured correctly are about as likely as a fence
device falsely claiming success (which could happen ... in theory).

The key is figuring out how to make it reliable on the software side.  I
think that the watchdog daemon is probably the answer (or pretty darn
close).

Practically speaking, when deciding to think about fencing or related
technologies, it helps to enumerate the failure cases you're worried
about and what component's job it is to handle each problem.  For
example:

 * Kernel panic -> Handled by WDT
 * System loses power -> Don't care (dead anyway)
 * Watchdog daemon hang -> Handled by WDT
 * ...
 * Network disconnect -> Watchdog daemon?
 * Cluster software hangs/crashes -> Watchdog daemon?
 * ...

The slim cases where stuff might break generally involve the (properly
configured) watchdog daemon misbehaving at the same time the node /
cluster software misbehaves:

 * Network disconnect + watchdog daemon doesn't notice ...
 * Cluster software hang/crash + watchdog daemon doesn't notice ...
 * ...

Historically (i.e. RHEL3's clumanager), we had watchdog daemon support
built in to the cluster membership layer.  There are advantages to this
design because it solves some of the problems in a fairly concrete way
(e.g. "cluster software hang" is a non-issue, since it would cause the
WDT to trigger).  Unfortunately, there are also some disadvantages, too,
which is why it's not in the current cluster software (e.g. not being
able to check routing because the membership layer is time-critical).

-- Lon