[Linux-cluster] test hung after 36 hours
David Teigland
teigland at redhat.com
Tue Apr 12 03:30:26 UTC 2005
On Mon, Apr 11, 2005 at 05:13:06PM -0700, Daniel McNeil wrote:
> I started my mount/tar/rm/ tests on Apr 4 17:41 and I hit
> a problem at Apr 6 05:30. So the test ran for 36 hours.
> cl030 and cl031 were getting "SM: process_reply invalid"
> messages and cl032 got "No response" and "Missed too many
> heartbeats"
The SM messages are an effect of CMAN removing nodes. There's a fair
chance that this recent fix will help:
http://sources.redhat.com/ml/cluster-cvs/2005-q2/msg00018.html
--
Dave Teigland <teigland at redhat.com>
More information about the Linux-cluster
mailing list