R: R: [Linux-cluster] "Missed too many heartbeats" messages andhung cluster

Leandro Dardini l.dardini at comune.prato.it
Tue Jun 27 10:04:36 UTC 2006


 

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di 
> Fabrizio Lippolis
> Inviato: martedì 27 giugno 2006 11.52
> A: linux clustering
> Oggetto: Re: R: [Linux-cluster] "Missed too many heartbeats" 
> messages andhung cluster
> 
> Leandro Dardini ha scritto:
> 
> > If something happens between the two machine, they fence each other.
> 
> I have configured manual fencing but as I wrote it's not much 
> useful since, I think, requires manual handling which 
> couldn't be possible immediately. Therefore I am looking for 
> a method to let the services run even if such a thing 
> happens. This is not the first time the problem arises, 
> apparently without a reason, though the last time happened 
> long time ago.
> 
> > You can try to "ping" each other and see, when the problem 
> arise, the connectivity state.
> 
> Sometimes the machines are completely locked and it's not 
> even possible to log in. A brute force switch off is 
> necessary in this case. Sometimes looks like only the cluster 
> service is locked and I can regularly ping the other machine 
> though the cluster is not working.

This is really bad. This smells like an hardware problem or buggy kernel driver. Try to stress test the machines individually without cluster support. I usually start with a memtest from a Knoppix CD and then build a kernel for CPU stress. Try to transfer huge chunk of data to test the lan.

Leandro

> 
> > Maybe a "too much intelligent switch" is handling the 
> traffic and have some sort of "traffic shaping and control".
> 
> There is nothing like that, the two machines are connected by 
> a 1GB crossover cable, not even so long, provided by HP with 
> the two machines.
> 
> -- 
> Fabrizio Lippolis                
> fabrizio.lippolis at aurigainformatica.it
> Auriga Informatica s.r.l.            Via Don Guanella 15/B - 
> 70124 Bari
> Tel.: 080/5025414 - Fax: 080/5027448 - 
> http://www.aurigainformatica.it/
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 




More information about the Linux-cluster mailing list