[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU
Daniel P. Berrange
berrange at redhat.com
Fri Dec 18 10:17:24 UTC 2009
On Thu, Dec 17, 2009 at 03:39:05PM -0600, Anthony Liguori wrote:
> Chris Wright wrote:
> >Doesn't sound useful. Low-level, sure worth being able to turn things
> >on and off for testing/debugging, but probably not something a user
> >should be burdened with in libvirt.
> >But I dont' understand your -net vhost,fd=X, that would still be -net
> >tap=fd=X, no? IOW, vhost is an internal qemu impl. detail of the virtio
> >backend (or if you get your wish, $nic_backend).
> I don't want to get bogged down in a qemu-devel discussion on
> libvirt-devel :-)
> But from a libvirt perspective, I assume that it wants to open up
> /dev/vhost in order to not have to grant the qemu instance privileges
> which means that it needs to hand qemu the file descriptor to it.
> Given a file descriptor, I don't think qemu can easily tell whether it's
> a tun/tap fd or whether it's a vhost fd. Since they have different
> interfaces, we need libvirt to tell us which one it is. Whether that's
> -net tap,vhost or -net vhost, we can figure that part out on qemu-devel :-)
That is no problem, since we already do that kind of thing for TAP
devices it is perfectly feasible for us to also do it for vhost FDs.
> >>The more interesting invocation of vhost-net though is one where the
> >>vhost-net device backs directly to a physical network card. In this
> >>mode, vhost should get considerably better performance than the
> >>current implementation. I don't know the syntax yet, but I think
> >>it's reasonable to assume that it will look something like -net
> >>tap,dev=eth0. The effect will be that eth0 is dedicated to the
> >tap? we'd want either macvtap or raw socket here.
> I screwed up. I meant to say, -net vhost,dev=eth0. But maybe it
> doesn't matter if libvirt is the one that initializes the vhost device,
> setups up the raw socket (or macvtap), and hands us a file descriptor.
> In general, I think it's best to avoid as much network configuration in
> qemu as humanly possible so I'd rather see libvirt configure the vhost
> device ahead of time and pass us an fd that we can start using.
Agreed, if we can avoid needing to give QEMU CAP_NET_ADMIN then
that is preferred - indeed when libvirt runs QEMU as root, we already
strip it of CAP_NET_ADMIN (and all other capabilities).
> >>Another model would be to have libvirt see an SR-IOV adapter as a
> >>network pool whereas it handled all of the VF management.
> >>Considering how inflexible SR-IOV is today, I'm not sure whether
> >>this is the best model.
> >We already need to know the VF<->PF relationship. For example, don't
> >want to assign a VF to a guest, then a PF to another guest for basic
> >sanity reasons. As we get better ability to manage the embedded switch
> >in an SR-IOV NIC we will need to manage them as well. So we do need
> >to have some concept of managing an SR-IOV adapter.
> But we still need to support the notion of backing a VNIC to a NIC, no?
> If this just happens to also work with a naive usage of SR-IOV, is that
> so bad? :-)
> Long term, yes, I think you want to manage SR-IOV adapters as if they're
> a network pool. But since they're sufficiently inflexible right now,
> I'm not sure it's all that useful today.
FYI, we have generic capabilities for creating & deleting host devices
via the virNodeDevCreate / virNodeDevDestroy APIs. We use this for
creating & deleting NPIV scsi adapters. If we need to support this for
some types of NICs too, that fits into the model fine.
> >So I think we want to maintain a concept of the qemu backend (virtio,
> >e1000, etc), tbhe fd that connects the qemu backend to the host (tap,
> >socket, macvtap, etc), and the bridge. The bridge bit gets a little
> >complicated. We have the following bridge cases:
> >- sw bridge
> > - normal existing setup, w/ Linux bridging code
> > - macvlan
> >- hw bridge
> > - on SR-IOV card
> > - configured to simply fwd to external hw bridge (like VEPA mode)
> > - configured as a bridge w/ policies (QoS, ACL, port mirroring,
> > etc. and allows inter-guest traffic and looks a bit like above
> > sw switch)
> > - external
> > - need to possibly inform switch of incoming vport
> I've got mixed feelings here. With respect to sw vs. hw bridge, I
> really think that that's an implementation detail that should not be
> exposed to a user. A user doesn't typically want to think about whether
> they're using a hardware switch vs. software switch. Instead, they
> approach it from, I want to have this network topology, and these
> features enabled.
Agree there is alot of low level detail there, and I think it will be
very hard for users, or apps to gain enough knowledge to make intelligent
decisions about which they should use. So I don't think we want to expose
all that detail. For a libvirt representation we need to consider it more
in terms of what capabilities does each options provide, rather than what
implementation each option uses
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
More information about the libvir-list