[Linux-cluster] How to disable node?

Tue Sep 1 11:11:21 UTC 2009

Am Dienstag, den 01.09.2009, 12:48 +0200 schrieb Jakov Sosic:
> On Tue, 01 Sep 2009 12:29:36 +0200
> "Marc - A. Dahlhaus [ Administration | Westermann GmbH ]" <mad at wol.de>
> wrote:
> 
> > It isn't misbehaving at all here.
> > 
> > The job of RHCS in this case is to save your data against failure.
> > 
> > If fenced can't fence a node successfully, RHCS will wait in stalled
> > mode (because it doesn't get a successful response from the
> > fence-agent) until someone who knows what he is doing comes around to
> > fix up the problem. If it wouldn't do it that way a separated node
> > could eat up your data. It is the job of fenced to stop all
> > activities until fencing is in a working shape again.
> > 
> > This behaviour is perfectly fine IMO...
> 
> Isn't that the mission of quorum? For example - if you have qourum you
> will run services, if you don't have quorum you won't. If there is a
> qdisk and single of three nodes is missing, it can't have quorum - so
> it can't run services?
> 
> OK I understand that this is the safer way... But that's why I was
> asking in the first place for a command to flag node as missing
> completely, so that I can avoid all reconfigurations. Reconfiguration
> while a node missing will trigger odd behavior when node comes back -
> node will be fenced constantly because it has wrong config version.
> 
> 
> > - You use system dependent fencing like "HP iLO" wich will be missing
> >   if your system is missing and no independent fencing like an
> >   APC PowerSwitch...
> 
> Yes but that are the only devices I have available for fencing. So that
> is the limitation of hardware, on which I don't have any influence in
> this case. I already know that fence devices are my only SPOF
> currently... But I can't help myself.
> 
> 
> >   Think about a power purge which kills booth of your PSU on a system,
> >   a system dependent management device would be missing from your
> >   network in this case leading to exactly the problem you're faced
> > with.
> 
> I will take a look if APC UPS-es have something like killpower for
> certain ports, if not I will set up false manual fencing to get around
> this problem. Thank you.

Its actually the "APC Switched Rack PDUs" that you should look after.
You can get an 8 port device for a small budget...

> > Your mistake is that you started fenced in normal mode in which it
> > will fence all nodes that it can't reach to get around a possible
> > split-brain scenario. You need to start fenced in "clean start"
> > without fencing mode (read the fenced manpage as it is documented
> > there) because you know everything is right.
> 
> Adding clean_start again presumes reconfiguring just like removing a
> node and declaring cluster a two_node, and I wanted to avoid
> reconfigurations...

It's just a matter of starting fenced with "fenced -c" on your two
nodes. No cluster.conf fiddling needed at all...

Search for "start_daemon fenced" in /etc/init.d/cman and add " -c"
behind it. You should remove that after your third node gets back.

> Thank you very much.

You're welcome.

Marc