R: [Linux-cluster] "Missed too many heartbeats" messages and hung cluster

Fabrizio Lippolis Fabrizio.Lippolis at AurigaInformatica.it
Tue Jun 27 09:51:58 UTC 2006


Leandro Dardini ha scritto:

> If something happens between the two machine, they fence each other.

I have configured manual fencing but as I wrote it's not much useful 
since, I think, requires manual handling which couldn't be possible 
immediately. Therefore I am looking for a method to let the services run 
even if such a thing happens. This is not the first time the problem 
arises, apparently without a reason, though the last time happened long 
time ago.

> You can try to "ping" each other and see, when the problem arise, the connectivity state.

Sometimes the machines are completely locked and it's not even possible 
to log in. A brute force switch off is necessary in this case. Sometimes 
looks like only the cluster service is locked and I can regularly ping 
the other machine though the cluster is not working.

> Maybe a "too much intelligent switch" is handling the traffic and have some sort of "traffic shaping and control".

There is nothing like that, the two machines are connected by a 1GB 
crossover cable, not even so long, provided by HP with the two machines.

-- 
Fabrizio Lippolis                fabrizio.lippolis at aurigainformatica.it
Auriga Informatica s.r.l.            Via Don Guanella 15/B - 70124 Bari
Tel.: 080/5025414 - Fax: 080/5027448 - http://www.aurigainformatica.it/




More information about the Linux-cluster mailing list