[Linux-cluster] cluster failed after 53 hours
pcaulfie at redhat.com
Tue Jan 18 08:48:30 UTC 2005
On Mon, Jan 17, 2005 at 05:31:33PM -0800, Daniel McNeil wrote:
> My 3 node cluster ran tests for 53 hours before hitting a problem.
> Node cl031 hit the 1st problem CMAN: killed by STARTTRANS or
> NOMINATE. There is a DLM assert on cl031 also, but that is
> after a whole bunch of debug output. The full logs are
> here (http://developer.osdl.org/daniel/GFS/test.12jan2005/)
> Any ideas on what is going on?
> Here is simplified output (in the README file):
> test started Jan Wed 12 17:18
> hung after Fri Jan 14 22:00
> cl031 got an error in just under 53 hours.
> Jan 14 22:00:38 cl031 kernel: CMAN: node cl031a has been removed from the cluster : No response to messages
It's the usual thing. missing messages.
More information about the Linux-cluster