[rhn-users] Net interface flap, 2650, RH ES 3.0 2.4.21-15.ELsmp - my resolution
Jed Donnelley
jed at nersc.gov
Tue Sep 7 21:59:40 UTC 2004
All,
Back in late May I sent out a message to both these (Dell and Redhat) lists
about a problem I've been having with using the onboard GigE interfaces on
a Dell 2650 with Redhat ES 3. I've included that message below for any
potential interest. In recent months I did some more testing and learned a
few things about this problem:
1. The problem seems to be somewhat switch dependent. That is, others are
running the same hardware and software on a box but with a different switch
and not having the problem (e.g. there's at least one Dell switch which, if
placed between the 2650 and a another switch eliminates the problem).
The switch I'm connected to is a Foundry FastIron 3 with the Ironcore chip set.
2. With the appropriate switch (as above at least - others?) this problem
is easy to reproduce. Just do a kickstart (or any build) to Redhat ES 3.0
on the Dell 2650 connected to the Foundry switch, then upgrade to the
latest kernel (2.4.21-15.0.4.ELsmp anyway). The initial build works
fine. After the upgrade the box will then no longer communicate on the
network.
I tried a number of work arounds, including turning of auto negotiation
(with some help from our networking folks) and trying some other NICs that
also used the tg3 driver (an old SysKonnect NIC). None of these were
successful.
Finally I gave up and took the advice of others and installed an Intel NIC
that uses the e1000 driver (Intel PWLA8492MT, "Intel PRO/1000 MT Dual Port
Server Adapter). Doing so finally solved my problem.
It seems clear there's some sort of interoperability problem involving the
broadcom chip set, the tg3 driver, and the Foundry switch. I don't have
time to explore this problem further, but I thought I would share with
others what I learned from my testing.
--Jed http://www.nersc.gov/~jed/
Here's my original message for reference:
__________________________________________________________________________________________________
May 25, 2004:
I've been running RH ES 3.0 on a Dell 2650 server with for some time with a
GigE interface
with the Broadcom NetXtreme BCM5701 technology the tg3 driver, most
recently with the
RH 2.4.21-9.0.1.ELsmp kernel.
Last week I tried upgrading to the RH 2.4.21-15.ELsmp kernel. When I ran
with that newer kernel the network interface on the box went into a loop of
going up and down every few seconds.
Here's what I believe is the relevant part of the message log:
May 20 15:58:30 tsi kernel: Linux version 2.4.21-15.ELsmp
(bhcompile at bugs.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat
Linux 3.2.3-34)) #1 SMP Thu Apr 22 00:18:24 EDT 2004
...
May 20 15:58:31 tsi kernel: Processors: 4
May 20 15:58:31 tsi kernel: xAPIC support is present
May 20 15:58:31 tsi kernel: Enabling APIC mode: Flat.^IUsing 3 I/O APICs
May 20 15:58:31 tsi kernel: Kernel command line: ro root=/dev/md0
May 20 15:58:31 tsi kernel: Initializing CPU#0
May 20 15:58:31 tsi kernel: Detected 1993.566 MHz processor.
<actually 2 hyperthreaded procesors>
...
May 20 15:58:38 tsi kernel: eth0: Tigon3 [partno(BCM95701A10) rev 0105
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:04
May 20 15:58:38 tsi kernel: eth0: HostTXDS[0] RXcsums[1] LinkChgREG[1]
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: eth1: Tigon3 [partno(BCM95701A10) rev 0105
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:05
May 20 15:58:38 tsi kernel: eth1: HostTXDS[0] RXcsums[1] LinkChgREG[1]
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: ip_tables: (C) 2000-2002 Netfilter core team
May 20 15:58:38 tsi kernel: ip_conntrack version 2.1 (8192 buckets, 65536
max) - 304 bytes per conntrack
May 20 15:58:38 tsi kernel: tg3.c:v3.1 (April 3, 2004)
May 20 15:58:38 tsi kernel: eth0: Tigon3 [partno(BCM95701A10) rev 0105
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:04
May 20 15:58:38 tsi kernel: eth0: HostTXDS[0] RXcsums[1] LinkChgREG[1]
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: eth1: Tigon3 [partno(BCM95701A10) rev 0105
PHY(5701)] (PCIX:133MHz:64-bit) 10/100/1000BaseT Ethernet 00:06:5b:f8:0f:05
May 20 15:58:38 tsi kernel: eth1: HostTXDS[0] RXcsums[1] LinkChgREG[1]
MIirq[1] ASF[0] Split[0] WireSpeed[1] TSOcap[0]
May 20 15:58:38 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:38 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
May 20 15:58:38 tsi kernel: tg3: eth0: Link is down.
May 20 15:58:38 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:38 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
...
May 20 15:58:43 tsi kernel: tg3: eth0: Link is down.
...
May 20 15:58:46 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:58:46 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
...
May 20 15:58:58 tsi kernel: tg3: eth0: Link is down.
May 20 15:59:01 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:59:01 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
May 20 15:59:02 tsi kernel: tg3: eth0: Link is down.
May 20 15:59:05 tsi kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
May 20 15:59:05 tsi kernel: tg3: eth0: Flow control is on for TX and on for RX.
etc.
Here's the output from ethtool for the interface with the problem:
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: g
Wake-on: d
Current message level: 0x000000ff (255)
Link detected: no
With the older RH 2.4.21-9.0.1.ELsmp kernel I get the same output except
that I get:
...
Link detected: yes
with no flapping and with the interface up and functioning.
Has anybody else seen a comparable problem with RH 2.4.21-15.EL or RH
2.4.21-15.WS
with a GigE interface with the Broadcom NetXtreme BCM5701 using the tg3 driver?
I'm open to any suggestions. I certainly don't want a network interface
problem blocking my kernel updates.
--Jed http://www.nersc.gov/~jed/
More information about the rhn-users
mailing list