[Linux-cluster] test hung after 36 hours

David Teigland teigland at redhat.com
Tue Apr 12 03:30:26 UTC 2005


On Mon, Apr 11, 2005 at 05:13:06PM -0700, Daniel McNeil wrote:
> I started my mount/tar/rm/ tests on Apr  4 17:41 and I hit
> a problem at Apr  6 05:30.  So the test ran for 36 hours.
> cl030 and cl031 were getting "SM: process_reply invalid"
> messages and cl032 got "No response" and "Missed too many
> heartbeats"

The SM messages are an effect of CMAN removing nodes.  There's a fair
chance that this recent fix will help:
http://sources.redhat.com/ml/cluster-cvs/2005-q2/msg00018.html

-- 
Dave Teigland  <teigland at redhat.com>




More information about the Linux-cluster mailing list