[Linux-cluster] fencing loop in a 2-node partitioned cluster

Gianluca Cecchi gianluca.cecchi at gmail.com
Fri Feb 27 10:35:10 UTC 2009


unfortunately, doing this seems to have a problematic side effect.
Set up -f 1 on one node and -f 10 on the other.
Now if I panic one node, it is fenced by the other one, but when
restarting it remains in
start fencing....
till it forms after some minute an own cluster and kills cman on the other node
(the same problem as in bugzilla 485026 that you well know...)
Tried two times and as soon as I get rid of -f option in both the cman
init scripts the situation come back ok with the panic scenario
So I thinnk I will remain with default and with possible complete
cluster offline in case of total loss of intranetwork
(that is bonded, but in case of operations on VLAN that comprimeses it
I can still get the problem...)
But it is a pity that it could not scale down to production network if
heartbeat goes down (even kimberlite was able to do this..)
Or better, the quorum master should win and fence the other, or the
fencing should be service based....
There is something about this in the FAQ but it seems not so easy to
configure and have...

;-(

On Tue, Feb 24, 2009 at 8:09 PM, Marc Grimme <grimme at atix.de> wrote:

> This time you're lucky cause it's just a fenced option:
>
> [root at generix2 ~]# fenced -h
> Usage:
>
> fenced [options]
>
> Options:
>
>  -c           All nodes are in a clean state to start
>  -j <secs>     Post-join fencing delay (default 6)
>  -f <secs>     Post-fail fencing delay (default 0)
>  -O <path>    Override path (default /var/run/cluster/fenced_override)
>  -D           Enable debugging code and don't fork
>  -h           Print this help, then exit
>  -V           Print program version information, then exit
>
> Command line values override those in cluster.conf.
> For an unbounded delay use <secs> value of -1.
>
> And you don't want to change it for all nodes the same. So add this (-f )
> option to the /etc/init.d/cman initscript in the function start_daemons to
> fenced. As there is no variable like FENCED_FAIL_DELAY you have to change the
> script ;( .
>
> Marc.




More information about the Linux-cluster mailing list