[Linux-cluster] Cluster Suite v3 software watchdog
lhh at redhat.com
Wed Dec 21 18:45:02 UTC 2005
On Wed, 2005-12-21 at 16:25 -0200, Celso K. Webber wrote:
> Does anyone has had this issue before? Or am I missing any step on
> configuring the software watchdog feature?
> Another question for the Red Hat people on the list: does this "software
> watchdog" works ok? I ask because it's enabled by default when you add a
> new member to the cluster. The Cluster Suite v3 manual tells nothing
> about this resource either.
Yes, it works fine.
A few things could be happening:
(1) The NMI watchdog will reboot the machine if it detects an NMI hang.
This is only a few seconds.
(2) The cluster is extremely paranoid because you are not using a
STONITH device (power controller), and it's detecting internal hangs.
Try increasing the failover time.
(3) The cluster is not getting scheduled due to system load. See the
man page for cludb(8) about clumembd%rtp - both may help.
More information about the Linux-cluster