[Linux-cluster] Starter Cluster / GFS

Fri Nov 12 02:22:16 UTC 2010

Digimer,

>>>>I can't speak to heartbeat, but under RHCS you can have multiple fence methods and devices, and they will used in the order that they are found in the configuration file.

Separate hearbeat networks (not a single network with a bonded interface) is what my customers require.  I believe this is not available in standard Linux Cluster, as distributed by RedHat.  This is completely independent from what fencing device or method is used.

>>>>With the power-based devices I've used (again, just IPMI and NA), the poweroff call is more or less instant. I've not seen, personally, a lag exceeding a second with these devices. I would consider a fence device that does not disable a node in <1 second to be flawed.

1.
In the world where I work separate power-based devices are not an option. Blade servers do not even have power supplies. They use common power from the blade enclosure.  The only access to the power state is through service processor.

2.
We are not talking about long delays here. The whole cycle of taking the power off a blade including login to the service processor is less than 1 ms. Delay or lack thereof is not a problem.  The transactional nature of the processing is the issue.

Regards,

Chris Jankowski

-----Original Message-----
From: Digimer [mailto:linux at alteeve.com] 
Sent: Friday, 12 November 2010 03:39
To: linux clustering
Cc: Jankowski, Chris
Subject: Re: [Linux-cluster] Starter Cluster / GFS

On 10-11-11 04:59 AM, Jankowski, Chris wrote:
> Gordan,
> 
> I do understand the mechanism.  I was trying to gently point out that this behaviour is unacceptable for my commercial IP customers. The customers buy clusters for high availability. Loosing the whole cluster due to single component failure - hearbeat link is not acceptable. The heartbeat link is a huge SPOF. And the cluster design does not support redundant links for heartbeat.
> 
> Also, none of the commercially available UNIX clusters or Linux clusters (HP ServiceGuard, Veritas, SteelEye) would display this type of behaviour and they do not clobber cluster filesystems.  So, it is possible to achieve acceptable reaction to this type of failure.
> 
> Regards,
> 
> Chris Jankowski

I can't speak to heartbeat, but under RHCS you can have multiple fence methods and devices, and they will used in the order that they are found in the configuration file.

With the power-based devices I've used (again, just IPMI and NA), the poweroff call is more or less instant. I've not seen, personally, a lag exceeding a second with these devices. I would consider a fence device that does not disable a node in <1 second to be flawed.

--
Digimer
E-Mail: digimer at alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org