R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster

Leandro Dardini l.dardini at comune.prato.it
Thu Jun 29 07:27:38 UTC 2006


> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di 
> Fabrizio Lippolis
> Inviato: martedì 27 giugno 2006 15.36
> A: linux clustering
> Oggetto: Re: [Linux-cluster] "Missed too many heartbeats" 
> messages and hungcluster
> 
> Patrick Caulfield ha scritto:
> 
> >> Jun 23 23:37:17 AICLSRV02 kernel: CMAN: removing node 
> AICLSRV01 from 
> >> the cluster : Missed too many heartbeats
> > 
> > 
> > That message means that the heartbeat messages are getting 
> lost somehow.
> > either through an unreliable network link or something else odd 
> > happening on the machine to prevent the heartbeat packets 
> reaching the network.
> 
> This is very strange since the two machines are connected by 
> a gigabit crossover cable and no other device is in the 
> middle. Also, no firewall rules are configured on any machine.
> 
> By the way, actually I am using the fence manual method but 
> it isn't much helpful and I would like to switch to a method 
> that ensures a reliable service. Does it mean I have to buy a 
> device sitting in the middle of the machines that connects 
> network and power cables? I am rather new to it so please any 
> suggestion is welcome.
> 

A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS.
A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using?

Leandro


> -- 
> Fabrizio Lippolis                
> fabrizio.lippolis at aurigainformatica.it
> Auriga Informatica s.r.l.            Via Don Guanella 15/B - 
> 70124 Bari
> Tel.: 080/5025414 - Fax: 080/5027448 - 
> http://www.aurigainformatica.it/
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 




More information about the Linux-cluster mailing list