[Linux-cluster] node fenced by dlm_controld on a clean shutdown

Mon Nov 19 15:23:19 UTC 2012

On Mon, Nov 19, 2012 at 10:39:20AM +0100, Jacek Konieczny wrote:
> On Mon, Nov 19, 2012 at 10:16:48AM +0100, Jacek Konieczny wrote:
> > It goes like that:
> > - resources using the shared storage are properly stopped by Pacemaker.
> > - DRBD is cleanly demoted and unconfigured by Pacemaker
> > - Pacemaker cleanly exits
> > - CLVMD is stopped.
> > ??? dlm_controld is stopped
> > ??? corosync is being stopped
> > 
> > and at this point the node is fenced (rebooted) by the dlm_controld on
> > the other node. I would expect it continue with a clean shutdown.
> > 
> > Any idea how to debug/fix it?
> > Is this '541 cpg_dispatch error 9' the problem?
> 
> I found a workaround: I have added a 10 seconds pause between
> dlm_controld and corosync shutdown. The node shuts down cleanly now (is
> not fenced). '541 cpg_dispatch error 9' is still there in the logs,
> though.

corosync-cfgtool -H is supposed to shut down corosync cleanly using the
cfg_shutdown_callback.  It looks like it may not be doing that.