RE: network stalls on Fedora Core 3

On Friday, April 15, 2005 1:17 AM Keith Fetterman wrote:
>I continued my attempts to diagnose the source of my network stalls. 
>here is what I found out:
>- The problem only occurs when downloading large files over a T1.  The 
>problem does not occur when transferring files between two 
>computers on 
>the same 100BaseT switched subnet.  I successfully copied a 2.5GB file 
>between the Fedora Core 3 (FC3) and another linux box.
>We have a point-to-point T1 that connects our office to our 
>facility where our production servers and route to the 
>Internet exists. 
>  The problem occurs when downloading large files (+20MB files) over 
>this T1.  It happens randomly, but I usually get the first 10MB before 
>the network stalls.

I have similar problems over a Linksys (BEFSR11) router.

>Using Ethereal, I discovered a problem where I think the FC3 system is 
>not recovering from receiving a bad TCP packet.  When the FC3 system 
>receives a bad TCP packet, it sends s response to the remote server 
>requesting it to resend the packet.  The remote server does, but then 
>the FC3 system sends the request again for the same packet.  It does 
>this several times and then gives up.  I don't think the FC3 OS is 
>processing the resent packet properly so it retries several before 
>eventually giving up on with the download.  The lower level OS return 
>doesn't fail, it just stops responding, which is why from scp level it 
>looks like a stall.

My symptoms as you describe them are exactly what I see happening in
Ethereal, although you are going over a T1 while I am going over a cable
modem.  However, it is not the cable modem or its connection that is at
fault.  Something is astray between the Linksys router and FC3, and it
isn't the cabling because I can connect the FC3 box directly to the
cable modem and it then functions.  I wonder if there is any correlation
here that could point to the source of the problem.  Do you connect
through a router before your CSU/DSU to your T1 line?

>Our RedHat Enterprise 3 WS systems are handling this problem 
>i.e., when they receive a bad packet, they send the response 
>a resend.  They receive the retransmitted packet and then contine the 

Interesting.  I will give this a try and see if I exhibit the same

>Tonight, on the computer that was having the problem, I replaced the 
>Fedora Core 3 OS with RedHat Enterprise 3 (RHE3) OS.  The problem did 
>not exist with RHE3.  I was able to download 10 - 40MB files 
>over the T1 
>without a problem.  So I think the problem is with FC3.
>Does anyone have any idea why the FC3 OS might be having this problem 
>and what I can do or who to report the problem too?

I believe that the problem is related to either a kernel setting or the
driver.  Is it possible to "transplant" the NIC driver from RHEL3 to see
if this resolves the issue?  I would think that it would only involve
copying over the kernel module for your NIC.  Also, how do the kernel
settings for TCP/IP differ between RHEL3 and FC3?  Go to
/proc/sys/net/ipv4 on each system and compare.  I will try this out this
weekend if time permits.



