[libvirt] [RFC] [Patch] Support for Linux macvtap device

Daniel P. Berrange berrange at redhat.com
Wed Jan 27 12:18:39 UTC 2010


On Tue, Jan 26, 2010 at 05:22:05PM -0500, Stefan Berger wrote:
> "Daniel P. Berrange" <berrange at redhat.com> wrote on 01/26/2010 04:21:56 
> > 
> > libvir-list, gerhard.stenzel, Vivek Kashyap, arndb
> > 
> > Please respond to "Daniel P. Berrange"
> > 
> > On Mon, Jan 25, 2010 at 12:47:17PM -0500, Stefan Berger wrote:
> > > Hello!
> > > 
> > >  The attached patch provides support for the Linux macvtap device for
> > > Qemu by passing a file descriptor to Qemu command line similar to how 
> it
> > > is done with a regular tap device. I have modified the network XML 
> code
> > > to understand a definition as the following one here:
> > > 
> > > <network>
> > >   <name>vepanet</name>
> > >   <uuid>4ebd5168-6321-4757-8397-f6e83484f402</uuid>
> > >   <extbridge mode='vepa' dev='eth0'/>
> > > </network>
> > 
> > I don't think this is the correct place to be adding this kind
> > of configuration / functionality. The virNetworkPtr / <network>
> > XML is describing a virtual network capability which is *not*
> > directly connected to the LAN. It may be configured to route
> > from the virtual network to the LAN, with optional NAT applied.
> > So while the implementation may use a bridge device, this bridge
> > is not connected to any physical device. Since VEPA is about
> > directly connecting VMs to the LAN, this doesn't really fit here.
> 
> Yes, I have re-purposed the network XML to describe an external bride.
> 
> There's the following advantage to this:
> 
> - you can migrate a VM between machines that have different types of 
> connectivity, i.e, tap and macvtap
> 
> - pushing the eth0 into referenced XML makes it independent of the local 
> configuration of the host, i.e,
>   on the one host it may be eth0 and on the other eth1. eth0 in the above 
> XML could be a physical adapter,
>   or an SR-IOV physical adapter or virtual function of an SR-IOV adapter.

I agree that those are both good advantages, but I'm still not liking
the idea of re-purposing the network XML model for this. Unfortunately
I don't yet have a clear alternative that satisfies those goals. I rather
regret that the current stuff uses the name 'network' since it is somewhat
misleading as to its purpose :-) The best idea I can come up with so far
is to imagine a new "switch" object which would basically use the syntax
you are suggesting as extension for the 'network" object, but without all
the existing bits todo with NAT/routing/DHCP.  A 'switch' object might be
something that is also useful for the parallel work being done in firewall
filters in libvirt.

I don't think we neccessarily need to consider this mutually exclusive wrt
the direct syntax I suggest for VMs. We could start with the direct syntax
in VMs since that's pretty quick & easy to implement, and then introduce 
the idea of a 'switch' object later to give us an alternate host-independant
config.

> > In the context of bridging a guest to a plain ethernet device, these
> > fit together as follows
> > 
> >  1. The virNodeDevPtr APIs are used to discover what physical network
> >     devices exist, 'eth0'
> > 
> >  2. The virInterfacePtr APIs are used to create a bridge on the host
> >     br0, containing the physical device 'eth0'
> 
> 
> Yes, I suppose this is all done via 'virsh iface-*' commands.

Yes, that's correct.

> > So unless I'm missing something major in my reasoning here I think
> > in the domain XML we end up with two possible configs for guest
> > network interfaces
> > 
> > 
> > 1. The current one using plain Linux software bridging, which
> >    we can't change in an incompatible way
> > 
> >     <interface type='bridge'/>
> >        <source bridge='br0'/>
> >        <target dev='vnet0'/>
> >     </interface>
> > 
> >    Here, the source device is a bridge previously setup
> >    to have a physical device enslaved (regular or SR-IOV)
> >    The target device is the plain TAP device
> 
> plain TAP device -> no need for change here.
> 
> > 
> > 2. A new one using hardware bridging, which we can freely
> >    define for our new needs
> > 
> >     <interface type='direct'/>
> >       <source dev='eth0' mode='vepa|pepa|bridge'/>
> >       <target dev='vnet0'/>
> >     </interface>
> 
> In contrast to the ACLs ( :-) ), where I would regard the ACLs as 
> VM-attached data that ideally would migrate along when the VM migrates 
> between hosts, in the case of this network attachment I'd not put 
> host-specific information in the domain XML as is the case here with the 
> 'eth0'. Who knows, maybe it's going to be the SR-IOV virtual adapter eth10 
> on the destination side? With the redirection into the network XML (or 
> similar) one could define a network XML per VM, create that with 
> host-specific information on the destination, i.e., eth10, and then 
> migrate the VM previously linked to eth0 via macvtap that then connected 
> via eth10. It's more work for upper layers, but if there is a need for 
> optimization for throughput, then maybe that's the only way that 
> optimizations can be done. Otherwise if all VMs in the data center are 
> created with above XML and eth0 then they will all need to stay on eth0 I 
> suppose.

> In this context, how will the virtual functions of SR-IOV be administered 
> and given to VMs. I suppose their management would be left up to higher 
> layers?

As a general rule we leave policy decisions to the management apps and
merely provide them mechanism to implement their desired policy.

> 
> > 
> >    Here, source device is a physical device (regular or
> >    SR-IOV). The target device is a macvtap device.
> > 
> > In both cases the TAP or macvtap device is created on the fly when the
> > VM is booted & destroyed at shutdown (either by the kernel, or manually
> > by libvirt for macvtap).
> 
> Yes, as long as libvirt is running when the VM goes down it can delete the 
> macvtap device. If not, I am trying to delete all macvtap devices at VM 
> startup using the MAC address of the VM (which the macvtap inherits) as 
> search/delete criterion.

That is more than sufficient - we already assume libvirtd is running at
time of guest shutdown . We don't officially support the scneario of a
guest shutting down while libvirtd is stopped - just make best effort to
cope.

> > > 
> > > Index: libvirt/src/util/macvtap.c
> > > ===================================================================
> > > --- /dev/null
> > > +++ libvirt/src/util/macvtap.c
> > > @@ -0,0 +1,664 @@
> > > +/*
> > > + * Copyright (C) 2010 IBM Corporation
> > > + *
> > > + * This library is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * This library is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with this library; if not, write to the Free 
> Software
> > > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 
> 02111-1307  USA
> > > + *
> > > + * Authors:
> > > + *     Stefan Berger <stefanb at us.ibm.com>
> > > + */
> > > +
> > > +#include <config.h>
> > > +
> > > +#if defined(WITH_MACVTAP)
> > 
> > [snip].
> > 
> > I've not had time to look at the details of this macvtap.c code yet,
> > but I assume its doing all you need :-) Is there any benefit to using
> > the network libnl.so library, rather than the ioctl()'s directly ?
>  
> 
> Haven't looked at that library and its API, but can do so if it's 
> documented. Would it be ok to keep the current implementation, though?

I don't mind either way. I'll leave the decision upto you since you
know more about this code than me :-) So if you prefer to use the
current code that's fine.


Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list