R: [Linux-cluster] "Missed too many heartbeats" messages and hungcluster

Thu Jun 29 14:00:04 UTC 2006

On Thu, 2006-06-29 at 09:47 +0200, Fabrizio Lippolis wrote:
> Leandro Dardini ha scritto:
> 
> > A fencing device is required for granting consistency of write. If one node fails to comunicate with other devices, it can write in an unconditional mode and bye bye to GFS.
> > A fencing device is not only a power-fence device. In my case it is the fibre channel switch. When a node has to be fenced, other telnet to the fibre channel switch and turn off the port. This doesn't powercycle the device, but blocks the write on the shared device. What kind of shared device are you using?
> 
> It's a GFS file system on a disk array. Since I built the cluster for 
> MySQL and ldap services, it's the file system where actually are the 
> database and directory files. The disk array is physically connected to 
> both machines by a SCSI cable.
> 

You might be getting lockouts due to the storage subsystem you are
using.  GFS requires the ability to write/read concurrently from the
storage devices and generally overwhelms a direct attached SCSI array.
The configuration you describe will not be stable since when one node is
accessing the storage, the other machine is completely locked out of the
bus.  This is probably some of the problems you are having with missing
heartbeats.  It has been a long time since we have run in that
configuration, so not sure of the current behaviors, use fibre channel,
iscsi or gnbd as proper storage infrastructure.

Kevin