[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU

Thu Dec 17 21:16:48 UTC 2009

* Anthony Liguori (aliguori at linux.vnet.ibm.com) wrote:
> There are two modes worth supporting for vhost-net in libvirt.  The
> first mode is where vhost-net backs to a tun/tap device.  This is
> behaves in very much the same way that -net tap behaves in qemu
> today.  Basically, the difference is that the virtio backend is in
> the kernel instead of in qemu so there should be some performance
> improvement.
> 
> Current, libvirt invokes qemu with -net tap,fd=X where X is an
> already open fd to a tun/tap device.  I suspect that after we merge
> vhost-net, libvirt could support vhost-net in this mode by just
> doing -net vhost,fd=X.  I think the only real question for libvirt
> is whether to provide a user visible switch to use vhost or to just
> always use vhost when it's available and it makes sense.
> Personally, I think the later makes sense.

Doesn't sound useful.  Low-level, sure worth being able to turn things
on and off for testing/debugging, but probably not something a user
should be burdened with in libvirt.

But I dont' understand  your -net vhost,fd=X, that would still be -net
tap=fd=X, no?  IOW, vhost is an internal qemu impl. detail of the virtio
backend (or if you get your wish, $nic_backend).

> The more interesting invocation of vhost-net though is one where the
> vhost-net device backs directly to a physical network card.  In this
> mode, vhost should get considerably better performance than the
> current implementation.  I don't know the syntax yet, but I think
> it's reasonable to assume that it will look something like -net
> tap,dev=eth0.   The effect will be that eth0 is dedicated to the
> guest.

tap?  we'd want either macvtap or raw socket here.

> On most modern systems, there is a small number of network devices
> so this model is not all that useful except when dealing with SR-IOV
> adapters.  In that case, each physical device can be exposed as many
> virtual devices (VFs).  There are a few restrictions here though.
> The biggest is that currently, you can only change the number of VFs
> by reloading a kernel module so it's really a parameter that must be
> set at startup time.
> 
> I think there are a few ways libvirt could support vhost-net in this
> second mode.  The simplest would be to introduce a new tag similar
> to <source network='br0'>.  In fact, if you probed the device type
> for the network parameter, you could probably do something like
> <source network='eth0'> and have it Just Work.

We'll need to keep track of more than just the other en
We need to 0

> Another model would be to have libvirt see an SR-IOV adapter as a
> network pool whereas it handled all of the VF management.
> Considering how inflexible SR-IOV is today, I'm not sure whether
> this is the best model.

We already need to know the VF<->PF relationship.  For example, don't
want to assign a VF to a guest, then a PF to another guest for basic
sanity reasons.  As we get better ability to manage the embedded switch
in an SR-IOV NIC we will need to manage them as well.  So we do need
to have some concept of managing an SR-IOV adapter.

So I think we want to maintain a concept of the qemu backend (virtio,
e1000, etc), the fd that connects the qemu backend to the host (tap,
socket, macvtap, etc), and the bridge.  The bridge bit gets a little
complicated.  We have the following bridge cases:

- sw bridge
  - normal existing setup, w/ Linux bridging code
  - macvlan
- hw bridge
  - on SR-IOV card
    - configured to simply fwd to external hw bridge (like VEPA mode)
    - configured as a bridge w/ policies (QoS, ACL, port mirroring,
      etc. and allows inter-guest traffic and looks a bit like above
      sw switch)
  - external
    - need to possibly inform switch of incoming vport

And, we can have a hybrid.  E.g., no reason one VF can't be shared by a
few guests.