[Linux-cluster] Bonding Interfaces: Active Load Balancing & LACP
epretorious at yahoo.com
Wed Jun 13 03:08:58 UTC 2012
A good friend explained it this way:
The problem you describe is a fairly well-known issue and there's really not a good fix for it. Often, a switch will support multiple
addressing algorithms (L2, L2_L3, L2_L3_L4, L3_L4). All bond a flow to a given port for egress. This means that if you have a single data flow between two servers that are connected to the same switch, you are limited to the speed of a single uplink.
assuming in the case of the HP Procurve 2824 that the "SA/DA (Source
Address/Destination Address) method of distributing traffic" is really
marketing speak for L2 hashing.
If there's an option to do L3_L4
or L2_L3_L4, you might be slightly better off if there are multiple
flows involved. In your case, it doesn't sound like there actually are
multiple flows. If it really is only one flow, you'd need a 10GbE switch and interfaces to go faster.
Brocade has supposedly implemented frame-spraying (true round-robin) on
their latest switches. They do this by the use of custom ASICs derived
from their fibre channel switch lines, which have had frame-spraying
for some time.
In frame-spraying (assuming a 4-port
port-channel/loadshare), frame A goes to port 1, frame B goes to port 2,
frame C goes to port 3, frame D goes to port 4, frame E goes to port 1,
frame F goes to port 2, etc.
This method supposedly gives a
fairly good traffic distribution even with small numbers of flows.
There are still corner cases where it wouldn't work well. It also
doesn't fix any problems that can arise if the sending system doesn't
implement frame-spraying (which it probably won't).
> From: Eric <epretorious at yahoo.com>
>To: "linux-cluster at redhat.com" <linux-cluster at redhat.com>
>Sent: Wednesday, June 6, 2012 9:12 PM
>Subject: [Linux-cluster] Bonding Interfaces: Active Load Balancing & LACP
>I'm currently using the HP Procurve 2824 24-port Gigabit Ethernet switch to for a backside network for synchronizing file systems between the nodes in the group. Each host has 4 Gigabit NIC's and the goal is to bond two of the Gigabit NIC's together to create a 2 Gbps link from any host to any other host but what I'm finding is that the bonded links are only capable of 1 Gbps from any host to any other host. Is it possible to
create a multi-Gigabit link between two hosts (without having to upgrade to 10G) using a switch that "uses the
SA/DA (Source Address/Destination Address) method of distributing
traffic across the trunked links"?
>The problem, at least as far as I can tell, comes down to the
limitation of ARP resolution (in the host) and mac-address tables (in
>When configured to use Active Load Balancing, the kernel driver leaves each of the interface's MAC
addresses unchanged. In this scenario, when Host A sends sends traffic
to host Host B, the kernel uses the MAC address of only one of Host B's
NIC's as the DA. When the packet arrives at the switch, the switch
consults the mac-address table for the DA and then sends the packet to
the interface connected to the NIC with MAC address equal to DA. Thus
packets from Host A to Host B will only leave the switch through one
interface - the interface connected to the NIC with MAC address equal to DA. This has the effect of limiting the throughput from Host A to Host B to the speed of the one interface connected to the NIC with MAC address equal to DA.
>When configured to use IEEE 802.3ad (LACP), the kernel driver assigns the same MAC address to all of the hosts'
interfaces. In this scenario, when Host A sends traffic to Host B, the
kernel uses Host B's shared MAC address as the DA. When the packet
arrives at the switch, the switch creates a hash based on the SA/DA
pair, consults the mac-address table for the DA, and and assigns the
flow (i.e., traffic from Host A to Host B) to one of the interfaces
connected to Host B. Thus packets from Host A to Host B will only leave
the switch through one interface - the interface determined by the SA/DA hash. This has the effect of limiting the throughput from Host A to Host B to the speed of the one interface determined by the hashing method. However, if the flow (from Host A to Host B's shared MAC
address) were to be distributed across the different interfaces in a
fashion (as the
packets were leaving the switch) the throughput between the hosts would
equal the aggregate of
the links (IIUC).
>Is this a limitation of the the Procurve's
implementation of LACP? Do other switches use different methods of
distributing traffic across the trunked links? Is there another method
of aggregating the links between the two hosts (e.g., multipathing)?
>Linux-cluster mailing list
>Linux-cluster at redhat.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Linux-cluster