[Linux-cluster] cluster failed after 53 hours

Patrick Caulfield pcaulfie at redhat.com
Tue Jan 18 08:48:30 UTC 2005


On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
> My 3 node cluster ran tests for 53 hours before hitting a problem.
> 
> 
> Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or
> NOMINATE.  There is a DLM assert on cl031 also, but that is
> after a whole bunch of debug output.  The full logs are
> here (http://developer.osdl.org/daniel/GFS/test.12jan2005/)
> 
> Any ideas on what is going on?
> 
> Here is simplified output (in the README file):
> test started Jan Wed 12 17:18
> hung after Fri Jan 14 22:00
> 
> cl031 got an error in just under 53 hours.
> ==========================================
> Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages

It's the usual thing. missing messages.

patrick




More information about the Linux-cluster mailing list