[Linux-cluster] Network hiccup + power-fencing = both nodes godown(redhat cluster 4)

Patrick Caulfield pcaulfie at redhat.com
Tue Jan 17 14:03:23 UTC 2006


Jeff Harr wrote:
> Thanks Patrick.  I have upped my deadnode_timeouts to 120 each.  
> 
> My worry though is the box somehow rebooting and joining faster than the
> other can wait its 120 seconds and take over the cluster.  Is there
> another timeout value that I can tweak to keep the original, crashed
> node from rebooting and joining too quickly?  Unfortunately, when the
> boxes crash they seem to come right back up and not stay dead.  I think
> this might be ILO behavior, but not sure.  I know when I shutdown -hy
> now, they stay down, and when the power-fencing takes place they stay
> down too, but not for crashes.
> 

If the crashed node tries to join while the other node thinks it's still in
the cluster then it will get rejected and its join should fail. Of course the
other node will still think it's alive but won't be able to talk to it because
it doesn't have any services running.

When the remaining node notices it has gone then it should fence it (and cause
another power cycle!). So things should be OK.

Are you seeing actual problems ?
-- 

patrick




More information about the Linux-cluster mailing list