[Linux-cluster] Re: generic fencing, stonith, etc [was: gfs and fencing]

Wed Nov 16 19:04:23 UTC 2005

On Wed, 2005-11-16 at 12:14 +0300, Denis Medvedev wrote: 
>   
> >
> Why not have ssh -like fencing? On a separate channel a ssh command can 
> be issued to another host to reboot or to just kill cluster processes?
> In Linux-HA stonith has a ssh feature.

This presumes the host is reachable, and will do the right thing.  While
the latter may be an acceptable premise, the former is generally not.

It will never succeed if the kernel is panicked.  There is no limit to
how long a live hang can last, so giving up after an arbitrary timeout
is a poor idea.  The node, then, becomes a single point of failure.  Not
good, but it is certainly not the worst thing that can happen (e.g. no
fencing at all, like the 'null' driver).

Certainly, the 'ssh' agent could automatically recover in a few more
cases than the 'manual' agent ('meatware' in Linux-HA lingo).

> Moreover, why not have a fence_stonith which will invoke stonith as a 
> fencing agent?

Someone mentioned recently (elsewhere) that it would be great if we had
a fencing agent which just called some user-specified command to do
things (and could substitute variables if necessary, like %p ->
password, %l -> login, etc.):

i.e.

  <fencedevice agent="fence_generic" name="foo" exec="/usr/bin/foo"/>

... per-node config:

  <device name="foo"
   append="--node node1 --username user --password pass"/>

When "node1" needs fencing, fenced would call:

   /usr/bin/foo --node node1 --username user --password pass"

Assuming the above command is guaranteed to reboot node1 (or turn it
off), the cluster may safely recover.

Maybe this "generic" agent is the right solution after all, and it would
fix your second request: you can just call the stonith command if that
is what your configuration requires, and that can call the ssh STONITH
plugin, right?  After all, stonith could call fence_node if someone
wrote an external plugin specification for fence_node, too.

Well, if it were up to me, I would say "no", but (perhaps...fortunately)
it is not up to me.  I worry about two things:

(a) I do not think this is not a supportable solution, due to the
limitless array of possible configurations for hardware we have never
heard of.  All support for this agent would likely be limited to this
mailing list and bugs in the actual agent itself.

(b) Rather than developing a proper fence agent for their particular
hardware, people will use this as a means to an end, which will not
improve the linux-cluster project as a whole.  That sucks :(

So, given that it is not up to me, who else wants this (I'm talking to
you lurkers out there).  I am waiting for the flaming mantis to burn me
for even suggesting such a thing...

-- Lon