[Linux-cluster] rhel 6.2 network bonding interface in cluster environment

SATHYA - IT sathyanarayanan.varadharajan at precisionit.co.in
Mon Jan 9 13:23:42 UTC 2012


Corosync (heartbeat) network is not connected to switch. The network is
connected between server to server directly. 


Sathya Narayanan V
Solution Architect	
-----Original Message-----
From: Alan Brown [mailto:ajb2 at mssl.ucl.ac.uk] 
Sent: Monday, January 09, 2012 6:46 PM
To: linux clustering
Cc: Digimer; SATHYA - IT
Subject: Re: [Linux-cluster] rhel 6.2 network bonding interface in cluster

On 09/01/12 05:24, Digimer wrote:

> With both of the bond's NICs down, the bond itself is going to drop.

Odds are, both NICs are plugged into the same switch.

(assuming the OP isn't running things plugged nic-nic - which I have found
in the past tends to be flakey when N-way negotiation becomes

I'm assuming "heartbeat" - is a dedicated corosync (v)lan.

To the OP: Please look at
and the descriptions of bonding there.

The type of bond you want for this purpose is either LACP (mode 3) (if NICs
are plugged into a single switch or switch stack which supports
LACP) or Active Failover (mode 1) if separate switches are involved.

Any other mode is potentially failure prone if things go wrong.

FWIW: My heartbeat setup is as follows.

2 switches with a 4way LACP bond between them.

2 NICs on each cluster member in bonding mode 1, one NIC on each switch.

This setup is resiliant against individual link (NIC, cable or fat
fingers) OR switch failures.

Switches used for this purpose are best completely isolated from the rest of
the network and multicast traffic control should be DISABLED.

Corosync can be set to failover to the public lan as a last resort but I've
found it's not necessary - if things get bad enough that the private lan is
completely out of action then the systems should shut themselves down (bad
data is worse than zero data).

Switch ports should be set "portfast" or whatever the non-cisco equivalent
is, or else ~30 seconds will be wasted in checking that whatever's attached
doesn't have a lan segment behind it. This can also lead to fencing.

This communication may contain confidential information. 
If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication.. 
Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email. 
Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group. 
All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use.

More information about the Linux-cluster mailing list