[Linux-cluster] rhel 6.2 network bonding interface in cluster environment

Mon Jan 9 13:16:18 UTC 2012

On 09/01/12 05:24, Digimer wrote:

> With both of the bond's NICs down, the bond itself is going to drop.

Odds are, both NICs are plugged into the same switch.

(assuming the OP isn't running things plugged nic-nic - which I have 
found in the past tends to be flakey when N-way negotiation becomes 
involved)

I'm assuming "heartbeat" - is a dedicated corosync (v)lan.

To the OP: Please look at 
http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driver-howto.php 
and the descriptions of bonding there.

The type of bond you want for this purpose is either LACP (mode 3) (if 
NICs are plugged into a single switch or switch stack which supports 
LACP) or Active Failover (mode 1) if separate switches are involved.

Any other mode is potentially failure prone if things go wrong.

FWIW: My heartbeat setup is as follows.

2 switches with a 4way LACP bond between them.

2 NICs on each cluster member in bonding mode 1, one NIC on each switch.

This setup is resiliant against individual link (NIC, cable or fat 
fingers) OR switch failures.

Switches used for this purpose are best completely isolated from the 
rest of the network and multicast traffic control should be DISABLED.

Corosync can be set to failover to the public lan as a last resort but 
I've found it's not necessary - if things get bad enough that the 
private lan is completely out of action then the systems should shut 
themselves down (bad data is worse than zero data).

Switch ports should be set "portfast" or whatever the non-cisco 
equivalent is, or else ~30 seconds will be wasted in checking that 
whatever's attached doesn't have a lan segment behind it. This can also 
lead to fencing.