[Linux-cluster] Clusters with multihomed hosts

Steven Whitehouse swhiteho at redhat.com
Thu Nov 13 16:54:24 UTC 2008


Hi,

On Thu, 2008-11-13 at 09:21 -0700, Michael O'Sullivan wrote:
> Hi all,
> 
> I need to know more about using redundant NICs in clusters.
> 
> I have a 2-node cluster with 2 NICs in each node. The first NICs on each 
> node are connected to one switch, the second NICs on each node are 
> connected to another switch. This is an experimental arrangement so I am 
> using /etc/hosts instead of DNS. It appears that the cluster software 
> becomes confused if I put both NICs for the hosts in the /etc/hosts 
> file, even if they are in different subnets. Here is the /etc/hosts file 
> I would like to use:
> 
> # localhost line
> 192.168.10.1     node1
> 192.168.10.2     node2
> 192.168.20.1     node1 # Second NIC on node 1
> 192.168.20.2     node2 # Second NIC on node 2
> 
> but this seems to cause the cluster to hang (confused about which NIC to 
> use?), so I have removed the last 2 lines and everything works fine. 
> However, this means if the switch on the 192.168.10.x subnet fails the 
> cluster will "break". If the cluster would recognise that node1 and 
> node2 are available via the second NICs then I wouldn't have to worry 
> about this single point-of-failure.
> 
The trouble with this kind of thing is that you find that its not easy
to control which external IP address a particular application uses as
you have discovered. It can be done though, with the aid of iproute2.

The kernel will look at the routing table to work out where to send a
particular packet, and once its found a suitable destination interface
it will then look at the various possible source IPs on that interface
in order to work out which one to use. It tries to use the source
address which has most bits matching with the destination (counting from
the network end to the host end of the IP address) so that if the
destination address is on a particular subnet, it will try to use an IP
from the same subnet as the source address if one is available.

You can alter this quite easily though, you just set up a second routing
table and use routing rules in order to select the correct table
according to your network. Thats where iproute2 comes in and there is a
set of docs here: http://lartc.org/

Also, just because you have two NIC's connected to different switches
doesn't mean that you need to give them different IP addresses/subnets.
The Linux IP stack can easily cope with them being the same, which would
also simplify the situation that you have, where, I suspect the cluster
stack has replied via a different NIC and thus received a different
source address.

> I have thought about bonding the NICs which (I think) would take care of 
> the problem, but I have heard that boding two NICs usually does not  
> give double the bandwidth. I  have read a little about high-availability 
> and failing over IP addresses, but this seems to be between different 
> nodes, not different NICs in the same host.
> 
> Would anyone please be able to give me some direction about the best way 
> to set up my cluster and NICs to make the cluster reliable in the event 
> of switch failure? And keep the full bandwidth of the NICs intact?
> 
> Thanks in advance for any help you can give. Kind regards, Mike
> 
The problem with bonding is that a single packet can only use a single
one of the parallel links. Also, by using multiple links on a single
stream (i.e. a TCP connection) you run the risk, if you are not careful,
of reordering the packets and that can cause slow downs at the receiving
end, and possibly generation of out of order ACK packets which might
cause retransmissions at the sending end, further slowing things down.

The Linux bonding driver has various modes to try and avoid that, and in
addition it also has 802.3ad mode which allows it to automatically
negotiate settings with a switch. Thats ideal if all the bonded links
for a particular node go to the same switch, but won't work across
switches as in your situation.

I suspect that the choice will come down to one of the following:

1. something easy to set up & not very efficient in terms of bandwidth,
but probably not too bad either.
    -> choose bonding (just be sure to select the right mode)

2. something more complex to set up, but which can be made to make full
use all of the available bandwidth, and be extended into more
complicated setups (dynamic routing, etc), given enough application
support & tweeking.
    -> choose the IP based solution

Steve.





More information about the Linux-cluster mailing list