[Linux-cluster] "Missed too many heartbeats" messages and hung cluster

Patrick Caulfield pcaulfie at redhat.com
Tue Jun 27 09:01:59 UTC 2006


Fabrizio Lippolis wrote:
> I have configured two machines in a cluster domain to run mysql and ldap
> services. Everything works correctly except that from time to time,
> seems randomly, the two machines hung. Recently this is what I see in
> the log of the second machine:
> 
> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node AICLSRV01 from the
> cluster : Missed too many heartbeats


That message means that the heartbeat messages are getting lost somehow.
either through an unreliable network link or something else odd happening on
the machine to prevent the heartbeat packets reaching the network.

> 
> The two machines have been resetted to let them work again. Anybody
> could please explain what happened to cause this problem? I would also
> need a suggestion on how to configure a fence device so that the
> services could still continue to work. As you see actually I configured
> manual fence but that's not much useful. Thank you in advance.
> 


-- 

patrick




More information about the Linux-cluster mailing list