[Linux-cluster] Starter Cluster / GFS

Gordan Bobic gordan at bobich.net
Thu Nov 11 10:07:31 UTC 2010


Jankowski, Chris wrote:
> Gordan,
> 
> I do understand the mechanism.  I was trying to gently point out that
> this behaviour is unacceptable for my commercial IP customers. The customers
> buy clusters for high availability. Loosing the whole cluster due to single
> component failure - hearbeat link is not acceptable. The heartbeat link is
> a huge SPOF. And the cluster design does not support redundant links for
> heartbeat.
> 
> Also, none of the commercially available UNIX clusters or Linux clusters
> (HP ServiceGuard, Veritas, SteelEye) would display this type of behaviour
> and they do not clobber cluster filesystems.  So, it is possible to
> achieve acceptable reaction to this type of failure.

My point was that you can easily overcome the race by introducing a 
staggered delay into fencing that works around the race condition.

I never tried, but are you sure bonded devices don't work for heartbeat?

Gordan




More information about the Linux-cluster mailing list