[rhn-users] Net interface flap, 2650, RH ES 3.0 2.4.21-15.ELsmp - my resolution

Jed Donnelley jed at nersc.gov
Tue Sep 7 21:59:40 UTC 2004


All,

Back in late May I sent out a message to both these (Dell and Redhat) lists 
about a problem I've been having with using the onboard GigE interfaces on 
a Dell 2650 with Redhat ES 3.  I've included that message below for any 
potential interest.  In recent months I did some more testing and learned a 
few things about this problem:

1.  The problem seems to be somewhat switch dependent.  That is, others are 
running the same hardware and software on a box but with a different switch 
and not having the problem (e.g. there's at least one Dell switch which, if 
placed between the 2650 and a another switch eliminates the problem).

The switch I'm connected to is a Foundry FastIron 3 with the Ironcore chip set.

2.  With the appropriate switch (as above at least - others?) this problem 
is easy to reproduce.  Just do a kickstart (or any build) to Redhat ES 3.0 
on the Dell 2650 connected to the Foundry switch, then upgrade to the 
latest kernel (2.4.21-15.0.4.ELsmp anyway).  The initial build works 
fine.  After the upgrade the box will then no longer communicate on the 
network.

I tried a number of work arounds, including turning of auto negotiation 
(with some help from our networking folks) and trying some other NICs that 
also used the tg3 driver (an old SysKonnect NIC).  None of these were 
successful.

Finally I gave up and took the advice of others and installed an Intel NIC 
that uses the e1000 driver (Intel PWLA8492MT, "Intel PRO/1000 MT Dual Port 
Server Adapter).  Doing so finally solved my problem.

It seems clear there's some sort of interoperability problem involving the 
broadcom chip set, the tg3 driver, and the Foundry switch.  I don't have 
time to explore this problem further, but I thought I would share with 
others what I learned from my testing.

--Jed http://www.nersc.gov/~jed/

Here's my original message for reference:
__________________________________________________________________________________________________
May 25, 2004:
I've been running RH ES 3.0 on a Dell 2650 server with for some time with a 
GigE interface
with the Broadcom NetXtreme BCM5701 technology the tg3 driver, most 
recently with the
RH 2.4.21-9.0.1.ELsmp kernel.

Last week I tried upgrading to the RH 2.4.21-15.ELsmp kernel.  When I ran 
with that newer kernel the network interface on the box went into a loop of 
going up and down every few seconds.

Here's what I believe is the relevant part of the message log:

May 20 15:58:30 tsi kernel: Linux version 2.4.21-15.ELsmp 
(bhcompile at bugs.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat 
Linux 3.2.3-34)) #1 SMP Thu Apr 22 00:18:24 EDT 2004
...
May 20 15:58:31 tsi kernel: Processors: 4
May 20 15:58:31 tsi kernel: xAPIC support is present
May 20 15:58:31 tsi kernel: Enabling APIC mode: Flat.^IUsing 3 I/O APICs
May 20 15:58:31 tsi kernel: Kernel command line: ro root=/dev/md0
May 20 15:58:31 tsi kernel: Initializing CPU#0
May 20 15:58:31 tsi kernel: Detected 1993.566 MHz processor.
<actually 2 hyperthreaded procesors>
...
May 20 15:58:38 tsi kernel: eth0: Tigon3 [partno(BCM95701A10) rev 0105 
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:04
May 20 15:58:38 tsi kernel: eth0: HostTXDS[0] RXcsums[1] LinkChgREG[1] 
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: eth1: Tigon3 [partno(BCM95701A10) rev 0105 
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:05
May 20 15:58:38 tsi kernel: eth1: HostTXDS[0] RXcsums[1] LinkChgREG[1] 
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: ip_tables: (C) 2000-2002 Netfilter core team
May 20 15:58:38 tsi kernel: ip_conntrack version 2.1 (8192 buckets, 65536 
max) - 304 bytes per conntrack
May 20 15:58:38 tsi kernel: tg3.c:v3.1 (April 3, 2004)
May 20 15:58:38 tsi kernel: eth0: Tigon3 [partno(BCM95701A10) rev 0105 
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:04
May 20 15:58:38 tsi kernel: eth0: HostTXDS[0] RXcsums[1] LinkChgREG[1] 
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: eth1: Tigon3 [partno(BCM95701A10) rev 0105 
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:05
May 20 15:58:38 tsi kernel: eth1: HostTXDS[0] RXcsums[1] LinkChgREG[1] 
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:38 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
May 20 15:58:38 tsi kernel: tg3: eth0: Link is down.
May 20 15:58:38 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:38 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
...
May 20 15:58:43 tsi kernel: tg3: eth0: Link is down.
...
May 20 15:58:46 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:46 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
...
May 20 15:58:58 tsi kernel: tg3: eth0: Link is down.
May 20 15:59:01 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:59:01 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
May 20 15:59:02 tsi kernel: tg3: eth0: Link is down.
May 20 15:59:05 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:59:05 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
etc.

Here's the output from ethtool for the interface with the problem:

Settings for eth0:
         Supported ports: [ MII ]
         Supported link modes:   10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Half 1000baseT/Full
         Supports auto-negotiation: Yes
         Advertised link modes:  10baseT/Half 10baseT/Full
                                 100baseT/Half 100baseT/Full
                                 1000baseT/Half 1000baseT/Full
         Advertised auto-negotiation: Yes
         Speed: 1000Mb/s
         Duplex: Full
         Port: Twisted Pair
         PHYAD: 1
         Transceiver: internal
         Auto-negotiation: on
         Supports Wake-on: g
         Wake-on: d
         Current message level: 0x000000ff (255)
         Link detected: no

With the older RH 2.4.21-9.0.1.ELsmp kernel I get the same output except
that I get:
...
         Link detected: yes

with no flapping and with the interface up and functioning.

Has anybody else seen a comparable problem with RH 2.4.21-15.EL or RH 
2.4.21-15.WS
with a GigE interface with the Broadcom NetXtreme BCM5701 using the tg3 driver?

I'm open to any suggestions.  I certainly don't want a network interface 
problem blocking my kernel updates.

--Jed http://www.nersc.gov/~jed/  





More information about the rhn-users mailing list