[libvirt] Network device abstraction aka virtual switch - V3

Stefan Berger stefanb at linux.vnet.ibm.com
Fri Jun 17 00:57:12 UTC 2011


On 06/12/2011 08:29 PM, Laine Stump wrote:
> This is a followup to 
> https://www.redhat.com/archives/libvir-list/2011-April/msg00591.html
> (and an even earlier draft) which I alluded to here:
>
>    https://www.redhat.com/archives/libvir-list/2011-June/msg00383.html
>
> Network device abstraction aka virtual switch - V3
> ==================================================
>
[...]
> The core goal of this proposal, though, is to replace type=bridge and
> type=direct from the domain interface XML with new types of <network>
> definitions so that the domain can just give "type='network'" and have
> all the necessary details filled in at runtime. This basically means
> we're adding several bridging modes (the submodes of "direct" have
> been flattened out here):
>
>  - Bridged network, eth + bridge + tap
>  - Bridged network, eth + macvtap + vepa
>  - Bridged network, eth + macvtap + private
>  - Bridged network, eth + macvtap + passthrough
>  - Bridged network, eth + macvtap + bridge
>
> Another "future expansion" could be to add:
>
>  - Bridged network, with VPN
>

This case sounds to me like the first one with for example OpenVPN's tap 
interface also added to the bridge.

> Likewise, support for other technologies, such as openvswitch and VDE
> would each be another entry on this list.
>
> (Dan also listed each of the above "+sriov" separately, but that ends
> up being handled in an orthogonal manner (by just specifying a pool of
> interfaces for a single network), so I'm only giving the abbreviated
> list)
>
> I. Changes to domain <interface> element
> ========================================
>
[...]
> <virtualport> element of <interface>
> ------------------------------------
>
> Since many of the attributes/sub-elements of <virtualport> (used by
> some modes of "direct" interface connections) are identical for all
> interfaces connecting to any given switch, most of the information in
> <virtualport> will be optional in the domain's interface definition -
> it can be filled in from a similar <virtualport> element that will be
> added to the <network> definition.
>
> Some parameters in <virtualport> ("instanceid", for example) must be
> unique for every interface, though, so those will still be specified
> in the <interface> XML. The two <virtualport> elements will be OR'ed
> at runtime to arrive at the actual set of parameters that are
> used.
>
> (Open Question: What should be the policy when a parameter is
> specified in both places? Should one take precedence? Or should it be
> considered an error?)
>
I think the one in the domain XML should take precedence assuming the 
user wants to make some parameter different for one particular interface.

> portgroup attribute of <source>
> -------------------------------
>
> The <source> element of an interface definition will be able to
> optionally specify a "portgroup" attribute. If portgroup is *NOT*
> given, the default (first) portgroup of the network will be used (if
> any are defined). If portgroup *IS* specified, the source network must
> have a portgroup by that name (or the domain startup/migration will
> fail), and the attributes of that portgroup will be used for the
> connection. Here is an example <interface> definition that has both a
> reduced <virtualport> element, as well as a portgroup attribute:
>
> <interface type='network'>
> <source network='red-network' portgroup='engineering'/>
> <virtualport type="802.1Qbg">
> <parameters instanceid="09b11c53-8b5c-4eeb-8f00-d84eaa0aaa4f"/>
> </virtualport>
> <mac address='de:ad:be:ef:ca:fe'/>
> </interface>
>
> (The specifics of what can be in a portgroup are given below)
>
>
> II. Changes to <network> definition
> ===================================
>
>
[...]

> A description of each:
>
> bridge-brctl - equivalent to "<interface type='bridge'>" in the
>                interface definition. The bridge device to use would be
>                given in the existing <forward dev='xxx'>. (Dan also
>                suggests putting this in <network>'s <bridge
>                name='xxx'/> - opinions?)
>                (Question: better name for this?)
>
Just 'bridge'?
> vepa         - same as "<interface type='direct'>..." with <source
>                mode='vepa'/>
>
> private      - <interface type='direct'> ... <source mode='private'/>
>
> passthrough  - <interface type='direct'> ... <source mode='passthrough'/>
>
> bridge-macvtap - <interface type='direct'> ... <source mode='bridge'/>
>                (Question: better name for this?)
>
> Interface Pools
> ---------------
>
> In many cases, a single host network may have multiple physical
> network devices associated with it (especially in the case of an
> SRIOV-capable ethernet card, which will have several "virtual
> functions" associated with a single physical ethernet connection). The
> host will at least want to balance the load of multiple guests between
> these multiple devices, and may even require (in the case of
> passthrough mode, for example) that only a single guest interface be
> attached to each host device.

>
> The current specification for <forward> only allows for a single "dev"
> attribute, though. In order to support multiple device names, we will
> extend <forward> to allow 0 or more <interface> sub-elements:
>
> <forward mode='vepa' dev='eth10'/>
> <interface dev='eth10'/>
> <interface dev='eth11'/>
> <interface dev='eth12'/>
> <interface dev='eth13'/>
> </forward>
>
So this becomes a pool now where libvirt keeps track of which ones of 
these interfaces is already in use.
> Note that, as a convenience, the first of these elements will always
> be a duplicate of the "dev" attribute in <forward> itself. (Is this
> necessary/desirable?)
It feels like this would require special handling in the code. If there 
was no dev in the forward node then that would require one to look into 
the pool right away. So maybe the dev attribute in the forward node 
would just be ignored if there is a pool of interfaces.
>
> In the case of mode='passthrough', only one guest interface can be
> connected to a device at a time. libvirt will keep track of which
> devices are in use, and attempt to assign a free device; failure to
> assign a device will result in a failure of the domain to
> start/migrate. For the other direct modes, libvirt will simply keep
> track of the number of guest interfaces currently using each device,
> and attempt to keep them balanced.
>
> (Open question: where will we keep track of this allocation/assignment?)
>
> Portgroups
> -----------
>
> A <portgroup> (sub-element of <network>) is just a way of easily
> putting connections to the network into different classes, with each
> class having a different level/type of service. Each <network> can
> have multiple <portgroup> elements, and each <portgroup> has a name,
> as well as various attributes associated with it. The first thing we
> will use portgroups for is as an alternate place to specify
> <virtualport> parameters:
>
> <portgroup name='engineering'>
> <virtualport type="802.1Qbg">
> <parameters managerid="11" typeid="1193047" typeidversion="2"/>
> </virtualport>
> </portgroup>
>
> Anything that is valid in an interface's <virtualport> is also valid 
> here.
>
> The next thing to specify in a portgroup will be bandwidth limiting /
> QoS configuration. Since I don't know exactly what's needed for that,
> I won't specify it here.
>
> If anything is specified both directly under <network> and in a
> <portgroup>, the value in portgroup will take precedence. (Again -
> what will the precedence of items specified in the <interface> be?)
>
> EXAMPLES
> --------
>
[...]
> =============
>
> Open Questions:
>
[...]
> * Where will we keep track of the count of guest interfaces connected
>   to each host interface device, and where will we keep track of which
>   device is being used by a particular guest interface? In the
>   network/domain XML?
As a user/administrator I may be interested to see it in both places, 
network and domain XML. At least that way I wouldn't have to dig too much...

I think this is necessary work and it feels like a lot of new complexity 
will need to be added...

Regards,
    Stefan




More information about the libvir-list mailing list