[Fedora-xen] TCP checksum corruption

Daniel P. Berrange berrange at redhat.com
Tue May 8 17:54:58 UTC 2007


On Tue, May 08, 2007 at 12:41:02PM -0500, Mike McGrath wrote:
> Daniel P. Berrange wrote:
> >On Tue, May 08, 2007 at 11:39:14AM -0500, Mike McGrath wrote:
> >  
> >>We're using xen heavily in Fedora's Infrastructure and presently a 
> >>number of the xen domU hosts are experiencing terrible checksum issues.  
> >>I've tried the ethtool -K eth0 tx off fix and it didn't work.
> >
> >What sort of network config have you got with these ?  Briding straight
> >to physical device, or NAT'd ?
> Bridge

That's good - should avoid the NAT related bugs there then.

> >There are a couple issues at play:
> >
> > - There is a general bug in 2.6.20  that breaks checksum offload
> >   when used with NAT.
> > - In 2.6.19 or later Dom0 will transmits to guests using checksum
> >   offload, so DHCP client in the guest will mistakenly thing it
> >   has a corrupt checksum.
> >
> >To address the first bug requires disabling checksum offload in the eth0 in
> >the guest. ethtool -K eth0 tx off    in the guest should do it.
> >
> >To address the 2nd is really difficult since the FC6 install images 
> >themsves
> >have a broken DHCP client for example, so we need to workaround it in the
> >kernel. This can be done by disabling checksums on the device in Dom0 - any
> >of vifN.0,  xenbr0, phet0 should have ethtook -K <dev> tx off done.
> >
> >NB, ignore eth0 in Dom0, that's a fake device so turning off tx on that 
> >does
> >not fix things.
> >
> >So in summary, to get it working in general case requires:
> >
> >   ethtool -K eth0 tx off    in guest
> >
> >And
> >
> >   ethtool -K <dev> tx off   on whatever bridge device the guest is 
> >   attached to
> >  
> I've actually run that on every interface on every dom[0,U] on the box 
> :).  I've also tried it on two other hosts.  One a RHEL5 dom0 and the 
> other had different hardware but was also a FC6 dom0.  I can arrange 
> access to the box if you're interested.

Ok that makes absolutely no sense to me now :-)  Everytime I hit it I was
able to solve it eventually by setting 'tx off' on some combo of devices.
The RHEL-5 Dom0 kernel also already has the neccessary fixes in which is
even odder that it doesn't work for you. 

Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=| 




More information about the Fedora-xen mailing list