[libvirt] RFC: managing "pci passthrough" usage of sriov VFs via a new network forward type

Gerhard Stenzel gstenzel at linux.vnet.ibm.com
Tue Aug 23 10:11:21 UTC 2011


On Mon, 2011-08-22 at 05:17 -0400, Laine Stump wrote:
> For some reason beyond my comprehension, the designers of SRIOV ethernet 
> cards decided that the virtual functions (VF) of the card (each VF 
> corresponds to an ethernet device, e.g. "eth10") should each be given a 
> new+different+random MAC address each time the hardware is rebooted. 

I read this is to avoid wasting MAC addresses from the vendor's pool
which might never be used

> Normally, udev keeps a persistent table that associates each known MAC 
> address with an ethernet device name - any time an ethernet device with 
> a previously-unknown MAC address is found, a new device name is 
> allocated ("eth11", etc) and the newly found MAC address is associated 
> with that device name. When an ethernet device is an SRIOV VF, though, 
> udev doesn't persist the MAC address, so at each boot a device is found 
> with a new MAC addres, but the device name from the previous boot is 
> "unused" so magically the device ends up with the same name even though 
> the MAC address has changed.

RHEL 6.1 seems to use the PCI id to manage the inteface name
in /etc/udev/rules.d/70-persistent-net.rules:

# PCI device 0x8086:0x10ed (ixgbevf)
SUBSYSTEM=="net", ACTION=="add", ATTR{dev_id}=="0x0",
KERNELS=="0000:15:10.0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth8"

> When this device is assigned to a guest via PCI passthrough, though, the 
> guest doesn't have the necessary information to realize that it's 
> actually an SRIOV VF, so the guest's udev persists the MAC address - on 
> the first boot of host+guest, the guest will see it has, e.g., mac 
> address 11:22:33:44:55:66 and udev will add an entry to its persistent 
> table remembering that 11:22:33:44:55:66="eth0". If the host reboots, 
> though, the VF will get a new MAC address, and when the guest boots, it 
> will see a new MAC address (e.g. "66:55:44:33:22:11") and think that 
> there's a different card, so it will create a new device (and a new udev 
> entry - 66:55:44:33:22:11="eth1"). This will repeat each time the host 
> reboots, with the obvious undesired consequences.
> 
> This makes using SRIOV VFs via PCI passthrough very unpalatable. The 
> problem can be solved by setting the MAC address of the ethernet device 
> prior to assigning it to the guest, but of course the <hostdev> element 
> used to assign PCI devices to guests has no place to specify a MAC 
> address (and I'm not sure it would be appropriate to add something that 
> function-specific to <hostdev>). Dave Allan and I have discussed a 
> different possible method of eliminating this problem (using a new 
> forward type for libvirt networks) that I've outlined below. Please let 
> me know what you think - is this reasonable in general? If so, what 
> about the details? If not, any counter-proposals to solve the problem?
> 
> Providing Predictable/Configurable MAC Addresses for SRIOV VFs used via 
> PCI Passthrough:
> 
> 1) <network> will have a new forward type='hardware'. When forward 
> type='hardware', a pool of ethernet interfaces can be specified, just as 
> for the forward types "bridge", "vepa", "private", and "passthrough". At 
> this point, that's the only thing that I've determined is needed in the 
> network definition.

type='hostdev'?

> 
> 2) In a domain's <interface> definition, when type='network', if the 
> network has a forward type='hardware', the domain code will request an 
> unused ethernet device from the network driver, then do the following:
> 
> 3) save the ethernet device name in interface/actual so that it can be 
> easily retrieved if libvirtd is restarted
> 
> 4) Set the MAC address of the given ethernet device according to the 
> domain <interface> config.
> 
> 5) Use the NodeDevice API to learn all the necessary PCI 
> domain/slot/bus/function and add a (non-persisting) <hostdev> element to 
> the guest's config before starting it up.
> 
> 6) When the guest is eventually destroyed, the ethernet device will be 
> free'd back to the network pool for use by another guest.
> 
> One problem this doesn't solve is that when a guest is migrated, the PCI 
> info for the allocated ethernet device on the destination host will 
> almost surely be different. Is there any provision for dealing with this 
> in the device passthrough code? If not, then migration will still not be 
> possible.
> 
> Although I realize that many people are predisposed to not like the idea 
> of PCI passthrough of ethernet devices (including me), it seems that 
> it's going to be used, so we may as well provide the management tools to 
> do it in a sane manner.

If I understand this correctly, this outlines an "implicit" pci
passthrough and there is no need to provide an explicit <hostdev/>
element in the domain xml. Guest configs using an explicit <hostdev/>
element would still expose the problem outlined above, correct?
Any plans for those?

> 
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list





More information about the libvir-list mailing list