[Linux-cluster] "dlm_controld[nnnn]: cluster is down, exiting" on node1 when starting node2

David Teigland teigland at redhat.com
Fri Jun 5 16:49:51 UTC 2009


On Fri, Jun 05, 2009 at 12:50:57PM -0400, Charlie Brady wrote:
> 
> On Fri, 5 Jun 2009, David Teigland wrote:
> 
> >On Fri, Jun 05, 2009 at 11:42:59AM -0400, Charlie Brady wrote:
> >>
> >>On Fri, 5 Jun 2009, David Teigland wrote:
> >>
> >>>On Thu, Jun 04, 2009 at 04:23:13PM -0400, Charlie Brady wrote:
> >>>>Jun  4 10:55:34 sun4150node1 dlm_controld[7916]: cluster is down, 
> >>>>exiting
> >>>>Jun  4 10:55:34 sun4150node1 fenced[7910]: cluster is down, exiting
> >>>>Jun  4 10:55:34 sun4150node1 gfs_controld[7922]: cluster is down, 
> >>>>exiting
> >>>>Jun  4 10:55:35 sun4150node1 qdiskd[8128]: <err> cman_dispatch: Host is
> >>>>down
> >>>
> >>>They are all complaining that the the cluster is down, which is a polite
> >>>way
> >>>of saying that aisexec has died/crashed/failed/killed/gone-away.
> >>
> >>Thanks. Why might that have occurred? Where would I look for clues? How
> >>can I increase logging output from aisexec?
> >
> >If you're lucky it'll leave a core file, otherwise aisexec is notorious for
> >disappearing without leaving any clues about why.
> 
> That's very disconcerting to hear. Doesn't sound like HA. :-(

To clarify, aisexec does not often disappear, it's very reliable.  The point
was that in the rare case when it does, it's notorious for not leaving any
reasons behind.

Dave




More information about the Linux-cluster mailing list