[libvirt] [PATCH] kvm/virtio: Set IFF_VNET_HDR when setting up tap fds

Mark McLoughlin markmc at redhat.com
Mon Jan 26 11:52:41 UTC 2009


Hi Rich,

On Mon, 2009-01-26 at 11:39 +0000, Richard W.M. Jones wrote:
> On Mon, Jan 26, 2009 at 10:39:25AM +0000, Mark McLoughlin wrote:
> > IFF_VNET_HDR is a tun/tap flag that allows you to send and receive
> > large (i.e. GSO) packets and packets with partial checksums. Setting
> > the flag means that every packet is proceeded by the same header which
> > virtio uses to communicate GSO/csum metadata.
> 
> Translating this for people not familiar with the intricacies of
> recent Linux networking changes ...
> 
> GSO = generic segmentation offload.  In a baremetal Linux install, the
> network driver can pass the job of splitting large packets over to the
> network card.  In virtualized environments, the "network card" is, for
> example, a virtio backend running in the host.  Because the network
> bridge runs entirely inside the host kernel, there are no physical
> limitations on packet size as there would be if it was real ethernet,
> so we can use this mechanism to pass over-sized packets to the host.
> Another advantage is that you don't need to compute checksums over the
> packets which are sent this way.

Yep, see also:

   http://blogs.gnome.org/markmc/2008/05/28/checksums-scatter-gather-io-and-segmentation-offload/

> "VNET_HDR" as far as I can gather refers to the special header that
> virtio_net prepends to such over-sized packets.  I'm not quite clear
> if userspace has to add this header, but if so presumably that is done
> inside qemu userspace(?).

VNET_HDR refers to the tap interface.

A tap interface is what qemu uses to inject ethernet frames into the
kernel networking stack. Normally, you just write() and read() each raw
frame with a single syscall per frame.

VNET_HDR is a flag for tap interfaces to say that frames we read() and
write() will have a struct virtio_net_hdr prepended so as to give the
kernel/qemu information about partial checksums and GSO.

We need to set this flag before we bring the interface up and add it to
a bridge. That's why libvirt has to do this rather than just leaving it
up to qemu.

> Libvirt sets the flag on the socket, passes the socket by number to
> qemu, and qemu needs to be able to query whether the flag was set.  So
> the patch concerns itself with making sure that all the relevant bits
> of this are supported.
> 
> Correct me if I'm wrong here ...

You're spot on, thanks for elaborating.

Cheers,
Mark.




More information about the libvir-list mailing list