[Linux-cluster] Starter Cluster / GFS
gordan at bobich.net
Thu Nov 11 10:07:31 UTC 2010
Jankowski, Chris wrote:
> I do understand the mechanism. I was trying to gently point out that
> this behaviour is unacceptable for my commercial IP customers. The customers
> buy clusters for high availability. Loosing the whole cluster due to single
> component failure - hearbeat link is not acceptable. The heartbeat link is
> a huge SPOF. And the cluster design does not support redundant links for
> Also, none of the commercially available UNIX clusters or Linux clusters
> (HP ServiceGuard, Veritas, SteelEye) would display this type of behaviour
> and they do not clobber cluster filesystems. So, it is possible to
> achieve acceptable reaction to this type of failure.
My point was that you can easily overcome the race by introducing a
staggered delay into fencing that works around the race condition.
I never tried, but are you sure bonded devices don't work for heartbeat?
More information about the Linux-cluster