[Linux-cluster] cluster failed after 53 hours

Daniel McNeil daniel at osdl.org
Wed Jan 19 18:47:57 UTC 2005


On Wed, 2005-01-19 at 00:50, Patrick Caulfield wrote:
> On Tue, Jan 18, 2005 at 03:10:20PM -0800, Daniel McNeil wrote:
> > 
> > There is an DLM ASSERT farther down in log that show error = -105
> > which is ENOBUFS.  Is this happening after the node has decided
> > to leave the cluster?  I just want to make sure a out of memory
> > problem isn't causing the problem.
> > 
> 
> Unfortunately it could be, or it may not be. :( 
> lowcomms_get_buffer() can return NULL if either a) there is no memory to
> allocate a page, or b) the DLM has been shut down. If that happens, -ENOBUFS is
> the result. On balance I would suspect that b) is more likely in this situation.
> 
> One oddity in that log is that the DLM took 10 minutes to shutdown after CMAN
> decided it had to leave the cluster - or did those 34980 lines have to go down a
> serial console? 

Yup.  Serial console.

Daniel




More information about the Linux-cluster mailing list