[Linux-cluster] cluster failed after 53 hours

Patrick Caulfield pcaulfie at redhat.com
Wed Jan 19 08:50:08 UTC 2005


On Tue, Jan 18, 2005 at 03:10:20PM -0800, Daniel McNeil wrote:
> 
> There is an DLM ASSERT farther down in log that show error = -105
> which is ENOBUFS.  Is this happening after the node has decided
> to leave the cluster?  I just want to make sure a out of memory
> problem isn't causing the problem.
> 

Unfortunately it could be, or it may not be. :( 
lowcomms_get_buffer() can return NULL if either a) there is no memory to
allocate a page, or b) the DLM has been shut down. If that happens, -ENOBUFS is
the result. On balance I would suspect that b) is more likely in this situation.

One oddity in that log is that the DLM took 10 minutes to shutdown after CMAN
decided it had to leave the cluster - or did those 34980 lines have to go down a
serial console? 

-- 

patrick




More information about the Linux-cluster mailing list