[libvirt] [RFC] [Patch] Support for Linux macvtap device
Daniel P. Berrange
berrange at redhat.com
Wed Jan 27 12:18:39 UTC 2010
On Tue, Jan 26, 2010 at 05:22:05PM -0500, Stefan Berger wrote:
> "Daniel P. Berrange" <berrange at redhat.com> wrote on 01/26/2010 04:21:56
> >
> > libvir-list, gerhard.stenzel, Vivek Kashyap, arndb
> >
> > Please respond to "Daniel P. Berrange"
> >
> > On Mon, Jan 25, 2010 at 12:47:17PM -0500, Stefan Berger wrote:
> > > Hello!
> > >
> > > The attached patch provides support for the Linux macvtap device for
> > > Qemu by passing a file descriptor to Qemu command line similar to how
> it
> > > is done with a regular tap device. I have modified the network XML
> code
> > > to understand a definition as the following one here:
> > >
> > > <network>
> > > <name>vepanet</name>
> > > <uuid>4ebd5168-6321-4757-8397-f6e83484f402</uuid>
> > > <extbridge mode='vepa' dev='eth0'/>
> > > </network>
> >
> > I don't think this is the correct place to be adding this kind
> > of configuration / functionality. The virNetworkPtr / <network>
> > XML is describing a virtual network capability which is *not*
> > directly connected to the LAN. It may be configured to route
> > from the virtual network to the LAN, with optional NAT applied.
> > So while the implementation may use a bridge device, this bridge
> > is not connected to any physical device. Since VEPA is about
> > directly connecting VMs to the LAN, this doesn't really fit here.
>
> Yes, I have re-purposed the network XML to describe an external bride.
>
> There's the following advantage to this:
>
> - you can migrate a VM between machines that have different types of
> connectivity, i.e, tap and macvtap
>
> - pushing the eth0 into referenced XML makes it independent of the local
> configuration of the host, i.e,
> on the one host it may be eth0 and on the other eth1. eth0 in the above
> XML could be a physical adapter,
> or an SR-IOV physical adapter or virtual function of an SR-IOV adapter.
I agree that those are both good advantages, but I'm still not liking
the idea of re-purposing the network XML model for this. Unfortunately
I don't yet have a clear alternative that satisfies those goals. I rather
regret that the current stuff uses the name 'network' since it is somewhat
misleading as to its purpose :-) The best idea I can come up with so far
is to imagine a new "switch" object which would basically use the syntax
you are suggesting as extension for the 'network" object, but without all
the existing bits todo with NAT/routing/DHCP. A 'switch' object might be
something that is also useful for the parallel work being done in firewall
filters in libvirt.
I don't think we neccessarily need to consider this mutually exclusive wrt
the direct syntax I suggest for VMs. We could start with the direct syntax
in VMs since that's pretty quick & easy to implement, and then introduce
the idea of a 'switch' object later to give us an alternate host-independant
config.
> > In the context of bridging a guest to a plain ethernet device, these
> > fit together as follows
> >
> > 1. The virNodeDevPtr APIs are used to discover what physical network
> > devices exist, 'eth0'
> >
> > 2. The virInterfacePtr APIs are used to create a bridge on the host
> > br0, containing the physical device 'eth0'
>
>
> Yes, I suppose this is all done via 'virsh iface-*' commands.
Yes, that's correct.
> > So unless I'm missing something major in my reasoning here I think
> > in the domain XML we end up with two possible configs for guest
> > network interfaces
> >
> >
> > 1. The current one using plain Linux software bridging, which
> > we can't change in an incompatible way
> >
> > <interface type='bridge'/>
> > <source bridge='br0'/>
> > <target dev='vnet0'/>
> > </interface>
> >
> > Here, the source device is a bridge previously setup
> > to have a physical device enslaved (regular or SR-IOV)
> > The target device is the plain TAP device
>
> plain TAP device -> no need for change here.
>
> >
> > 2. A new one using hardware bridging, which we can freely
> > define for our new needs
> >
> > <interface type='direct'/>
> > <source dev='eth0' mode='vepa|pepa|bridge'/>
> > <target dev='vnet0'/>
> > </interface>
>
> In contrast to the ACLs ( :-) ), where I would regard the ACLs as
> VM-attached data that ideally would migrate along when the VM migrates
> between hosts, in the case of this network attachment I'd not put
> host-specific information in the domain XML as is the case here with the
> 'eth0'. Who knows, maybe it's going to be the SR-IOV virtual adapter eth10
> on the destination side? With the redirection into the network XML (or
> similar) one could define a network XML per VM, create that with
> host-specific information on the destination, i.e., eth10, and then
> migrate the VM previously linked to eth0 via macvtap that then connected
> via eth10. It's more work for upper layers, but if there is a need for
> optimization for throughput, then maybe that's the only way that
> optimizations can be done. Otherwise if all VMs in the data center are
> created with above XML and eth0 then they will all need to stay on eth0 I
> suppose.
> In this context, how will the virtual functions of SR-IOV be administered
> and given to VMs. I suppose their management would be left up to higher
> layers?
As a general rule we leave policy decisions to the management apps and
merely provide them mechanism to implement their desired policy.
>
> >
> > Here, source device is a physical device (regular or
> > SR-IOV). The target device is a macvtap device.
> >
> > In both cases the TAP or macvtap device is created on the fly when the
> > VM is booted & destroyed at shutdown (either by the kernel, or manually
> > by libvirt for macvtap).
>
> Yes, as long as libvirt is running when the VM goes down it can delete the
> macvtap device. If not, I am trying to delete all macvtap devices at VM
> startup using the MAC address of the VM (which the macvtap inherits) as
> search/delete criterion.
That is more than sufficient - we already assume libvirtd is running at
time of guest shutdown . We don't officially support the scneario of a
guest shutting down while libvirtd is stopped - just make best effort to
cope.
> > >
> > > Index: libvirt/src/util/macvtap.c
> > > ===================================================================
> > > --- /dev/null
> > > +++ libvirt/src/util/macvtap.c
> > > @@ -0,0 +1,664 @@
> > > +/*
> > > + * Copyright (C) 2010 IBM Corporation
> > > + *
> > > + * This library is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU Lesser General Public
> > > + * License as published by the Free Software Foundation; either
> > > + * version 2.1 of the License, or (at your option) any later version.
> > > + *
> > > + * This library is distributed in the hope that it will be useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > > + * Lesser General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU Lesser General Public
> > > + * License along with this library; if not, write to the Free
> Software
> > > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
> 02111-1307 USA
> > > + *
> > > + * Authors:
> > > + * Stefan Berger <stefanb at us.ibm.com>
> > > + */
> > > +
> > > +#include <config.h>
> > > +
> > > +#if defined(WITH_MACVTAP)
> >
> > [snip].
> >
> > I've not had time to look at the details of this macvtap.c code yet,
> > but I assume its doing all you need :-) Is there any benefit to using
> > the network libnl.so library, rather than the ioctl()'s directly ?
>
>
> Haven't looked at that library and its API, but can do so if it's
> documented. Would it be ok to keep the current implementation, though?
I don't mind either way. I'll leave the decision upto you since you
know more about this code than me :-) So if you prefer to use the
current code that's fine.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
More information about the libvir-list
mailing list