[libvirt] Re: Supporting vhost-net and macvtap in libvirt for QEMU

Fri Dec 18 10:17:24 UTC 2009

On Thu, Dec 17, 2009 at 03:39:05PM -0600, Anthony Liguori wrote:
> Chris Wright wrote:
> >
> >Doesn't sound useful.  Low-level, sure worth being able to turn things
> >on and off for testing/debugging, but probably not something a user
> >should be burdened with in libvirt.
> >
> >But I dont' understand  your -net vhost,fd=X, that would still be -net
> >tap=fd=X, no?  IOW, vhost is an internal qemu impl. detail of the virtio
> >backend (or if you get your wish, $nic_backend).
> >  
> 
> I don't want to get bogged down in a qemu-devel discussion on 
> libvirt-devel :-)
> 
> But from a libvirt perspective, I assume that it wants to open up 
> /dev/vhost in order to not have to grant the qemu instance privileges 
> which means that it needs to hand qemu the file descriptor to it.
> 
> Given a file descriptor, I don't think qemu can easily tell whether it's 
> a tun/tap fd or whether it's a vhost fd.  Since they have different 
> interfaces, we need libvirt to tell us which one it is.  Whether that's 
> -net tap,vhost or -net vhost, we can figure that part out on qemu-devel :-)

That is no problem, since we already do that kind of thing for TAP
devices it is perfectly feasible for us to also do it for vhost FDs.

> 
> >>The more interesting invocation of vhost-net though is one where the
> >>vhost-net device backs directly to a physical network card.  In this
> >>mode, vhost should get considerably better performance than the
> >>current implementation.  I don't know the syntax yet, but I think
> >>it's reasonable to assume that it will look something like -net
> >>tap,dev=eth0.   The effect will be that eth0 is dedicated to the
> >>guest.
> >>    
> >
> >tap?  we'd want either macvtap or raw socket here.
> >  
> 
> I screwed up.  I meant to say, -net vhost,dev=eth0.  But maybe it 
> doesn't matter if libvirt is the one that initializes the vhost device, 
> setups up the raw socket (or macvtap), and hands us a file descriptor.
> 
> In general, I think it's best to avoid as much network configuration in 
> qemu as humanly possible so I'd rather see libvirt configure the vhost 
> device ahead of time and pass us an fd that we can start using.

Agreed, if we can avoid needing to give QEMU  CAP_NET_ADMIN then
that is preferred - indeed when libvirt runs QEMU as root, we already
strip it of CAP_NET_ADMIN (and all other capabilities).

> >>Another model would be to have libvirt see an SR-IOV adapter as a
> >>network pool whereas it handled all of the VF management.
> >>Considering how inflexible SR-IOV is today, I'm not sure whether
> >>this is the best model.
> >>    
> >
> >We already need to know the VF<->PF relationship.  For example, don't
> >want to assign a VF to a guest, then a PF to another guest for basic
> >sanity reasons.  As we get better ability to manage the embedded switch
> >in an SR-IOV NIC we will need to manage them as well.  So we do need
> >to have some concept of managing an SR-IOV adapter.
> >  
> 
> But we still need to support the notion of backing a VNIC to a NIC, no?  
> If this just happens to also work with a naive usage of SR-IOV, is that 
> so bad? :-)
> 
> Long term, yes, I think you want to manage SR-IOV adapters as if they're 
> a network pool.  But since they're sufficiently inflexible right now, 
> I'm not sure it's all that useful today.

FYI, we have generic capabilities for creating & deleting host devices 
via the virNodeDevCreate / virNodeDevDestroy  APIs. We use this for
creating & deleting NPIV scsi adapters. If we need to support this for
some types of NICs too, that fits into the model fine.

> >So I think we want to maintain a concept of the qemu backend (virtio,
> >e1000, etc), tbhe fd that connects the qemu backend to the host (tap,
> >socket, macvtap, etc), and the bridge.  The bridge bit gets a little
> >complicated.  We have the following bridge cases:
> >
> >- sw bridge
> >  - normal existing setup, w/ Linux bridging code
> >  - macvlan
> >- hw bridge
> >  - on SR-IOV card
> >    - configured to simply fwd to external hw bridge (like VEPA mode)
> >    - configured as a bridge w/ policies (QoS, ACL, port mirroring,
> >      etc. and allows inter-guest traffic and looks a bit like above
> >      sw switch)
> >  - external
> >    - need to possibly inform switch of incoming vport
> 
> I've got mixed feelings here.  With respect to sw vs. hw bridge, I 
> really think that that's an implementation detail that should not be 
> exposed to a user.  A user doesn't typically want to think about whether 
> they're using a hardware switch vs. software switch.  Instead, they 
> approach it from, I want to have this network topology, and these 
> features enabled.

Agree there is alot of low level detail there, and I think it will be
very hard for users, or apps to gain enough knowledge to make intelligent
decisions about which they should use. So I don't think we want to expose
all that detail. For a libvirt representation we need to consider it more
in terms of what capabilities does each options provide, rather than what
implementation each option uses

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|