[Linux-cluster] STONITH

Mon Oct 9 20:53:24 UTC 2006

On Fri, 2006-10-06 at 12:10 +0100, Grant Waters wrote:

> Powering cycling both nodes and the array fixes the problem, but I
> want to know whats causing it in the first place.  It doesn't appear
> to be related to load, although I can't rule that out - both outages
> were at approx 04:40 on a Friday. 

The tg3 link mysteriously disappearing/reappearing looks like the
culprit.  clumanager doesn't control those kinds of things...

(a) up the failover interval to 30sec.  If it's just a flaky
card/driver/cable/etc., this buys more time.

(b) cludb -p clumembd%rtp 10

If you think it's a scheduling problem.

(c) cludb -p cluster%msgsvc_noarp 1 

Gets rid of "SIOCGARP..." errors.

(d) cludb -p clulockd%loglevel 4

Because clulockd @ debug level is a waste of resources.

-- Lon