[Linux-cluster] fencing for no reason that I can see

Jeff Sturm jeff.sturm at eprize.com
Tue Sep 11 14:40:06 UTC 2012


> -----Original Message-----
> From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com]
> On Behalf Of Terry
> Sent: Monday, September 10, 2012 10:09 PM
> 
> I am using a 3 interface 802.3ad link aggregate on the production network.  I could
> either use an iscsi interface or split one of the three bond slave interfaces out and
> dedicate it to inter-node traffic.
>  I was also looking into a potential multicast issue but I believe my switches support it
> fine (Foundry FLS).  I wouldnt think it would be intermittent like this.  Anyone have any
> other thoughts?

Could be many things, truthfully.  In our experiences VLAN tagging hasn't caused any problem, but I'd certainly heed the warnings about "exotic bond modes".  We've found that simpler is better (i.e. more reliable) when it comes to networks.  Bonding mode 1 (active-passive) usually recovers fast enough to prevent loss of a cluster node.  When splitting active-passive interfaces over multiple independent switches, we've been able to down a switch administratively (for updates, etc.) without losing the cluster.

Make sure spanning-tree is either off or using RSTP everywhere.  The default STP forwarding delays are long enough to crash a cluster.  (We learned this the hard way.)

The other big problem we had turned out to be a firmware defect on the switch, so you can't rule that out either.  If there is any weakness in your network, RHCS is good at finding it!  I won't name the guilty vendor here, other than to say we've found Juniper gear works very well.  (Never tried Foundry.)

-Jeff






More information about the Linux-cluster mailing list