[libvirt] [PATCH 0/5] Interface pools and passthrough mode

Tue Dec 6 12:41:25 UTC 2011

On 12/06/2011 10:16 AM, Daniel P. Berrange wrote:
> On Mon, Dec 05, 2011 at 02:00:52PM -0500, Laine Stump wrote:
>> On 12/05/2011 06:37 AM, Daniel P. Berrange wrote:
>>> On Tue, Nov 29, 2011 at 08:29:35PM -0500, Laine Stump wrote:
>>>> On 11/29/2011 02:53 PM, Daniel P. Berrange wrote:
>>>>> On Tue, Nov 29, 2011 at 03:46:13PM +0000, Shradha Shah wrote:
>>>>>> Interface Pools and Passthrough mode:
>>>>>>
>>>>>> Current Method:
>>>>>> The passthrough mode uses a macvtap a direct connection to connect each guest to the network. The physical interface to be used is picked from among those listed in<interface>   sub elements of the<forward>   element.
>>>>>>
>>>>>> The current specification for<forward>   extends to allow 0 or more<interface>   sub-elements:
>>>>>> Example:
>>>>>> <forward mode='passthrough' dev='eth10'/>
>>>>>> <interface dev='eth10'/>
>>>>>> <interface dev='eth12'/>
>>>>>> <interface dev='eth18'/>
>>>>>> <interface dev='eth20'/>
>>>>>> </forward>
>>>>>>
>>>>>> However with an ethernet card with 64 VF's or more, the above method gets tedious on the system.
>>>>> Ignoring the ABI issue, I'm concerned that as we get PFs with an increasingly
>>>>> large number of VFs, we may well *not* want to associate all VFs with a single
>>>>> virtual network definition. eg, we might wna to put 32 VFs in one network and
>>>>> 32 VFs in another network.  Or if we have 2 PFs, we might want to interleave
>>>>> VFs from several PFs across virtual networks. If all we can do is list the
>>>>> PF in the XML, we loose significant flexibility in how VFs are assigned.
>>>> My first concern too when I saw the patch was the semantic change
>>>> (but also the loss of flexibility), which is obviously a no-go. It's
>>>> a convenient capability to have though, so it would be nice to get
>>>> it in somehow. What if we allowed including all the VFs associated
>>>> with a PF by adding an extra attribute?  e.g.:
>>>>
>>>> <interface dev='eth10' type='sriov'/>
>>> This feels a little bit wrong to me.
>>>
>>>> (or whatever is more appropriate in place of "sriov"). Or possibly a
>>>> different element type could be used:
>>>>
>>>> <pf dev='eth10'/>
>>> I like this idea, because it is providing additional useful info,
>>> rather than changing existing elements, so it is maximally
>>> compatible.
>>>
>>>> (didn't want to spend time thinking of a better name than "pf"...).
>>>>
>>>> At the time the network is created, this would cause libvirt to get
>>>> the list of all VFs for the given PF and put them into the pool.
>>>> This could be used instead of, or in combination with, the existing
>>>> <interface dev='eth1'/>  form. Thus the existing semantics would be
>>>> preserved, the flexibility of specifying individual devices would be
>>>> retained, and the desired convenience of adding all VFs of a PF with
>>>> a single line would be added.
>>> IIUC, what you're suggesting is the following behaviour:
>>>
>>>  * Explicit interface list. App inputs:
>>>
>>>     <forward mode='passthrough'>
>>>       <interface dev='eth10'/>
>>>       <interface dev='eth11'/>
>>>       <interface dev='eth12'/>
>>>       <interface dev='eth13'/>
>>>     </forward>
>>>
>>>    libvirt does not change XML
>>>
>>>  * Automatically interface list from PF. App inputs:
>>>
>>>      <forward mode='passthrough'>
>>>        <pf dev='eth0'/>
>>>      </forward>
>>>
>>>    libvirt expands XML to be
>>>
>>>     <forward mode='passthrough'>
>>>       <pf dev='eth0'/>
>>>       <interface dev='eth10'/>
>>>       <interface dev='eth11'/>
>>>       <interface dev='eth12'/>
>>>       <interface dev='eth13'/>
>>>     </forward>
>>>
>>> This is good because all previous info is still intact
>>
>>
>> I actually hadn't thought of modifying the XML and displaying it in
>> net-dumpxml or (netdumpxml --inactive), which is what I think you
>> may be implying here. This would have the advantage of making a
>> management application's job easier when displaying status
>> (available interfaces, etc), but could lead to confusion when a
>> host's hardware was changed (since there would be no detectable
>> difference between dev elements that were entered by hand, and those
>> that were automatically derived from a pf element). Also, it would
>> end up cluttering up the config file again, which is part of what
>> this is trying to avoid (although eliminating the need to type in
>> all N vf names is the primary concern).
>>
>> Unless we come up with a way of differentiating between
>> auto-generated <interface> elements (including keeping track of the
>> parent <pf>) and those entered by hand, I think the XML itself
>> shouldn't be changed, but only the contents of the interface pool in
>> memory.
> 
> As with domains, every network has both an active and inactive
> XML config. When the network is not running, we should only be
> showing the user provided <interface> elements. Only once you
> start the network, do we automatically fill in <interface>
> elements based on <pf>. So if we add a flag for virNetworkGetXMLDesc()
> like VIR_NETWORK_XML_INACTIVE, then you can distinguish by comparing
> the live XML to the inactive XML.
> 
>

I agree that an inactive XML should show only the user provided data. 
But in the case of active XML, would it be wise to display the entire VF pool in the XML?
I think this would clutter the config file when a NIC supports 127 VF's per port like in the case of Solarflare.

Also, the free Vf's are discovered only after virNetworkDefParseXML and networkAllocateActualDevice (in mode Passthrough). Is there a way of modifying the active XML at this point?

Shradha

> Daniel