[libvirt] PCI passthrough/SR-IOV on Cavium cn889x
alex.williamson at redhat.com
Wed Mar 21 17:35:12 UTC 2018
On Wed, 21 Mar 2018 15:46:01 +0000
Ciprian Barbu <Ciprian.Barbu at enea.com> wrote:
> In the context of running Openstack on a cluster of Cavium ThunderX cn8890 aarch64 servers, we are trying to attach virtual functions to a VM.
> First some introduction. This Cavium SoC has a different approach to Virtual Functions than on x86 NICs, in which VFs are always enabled and there are two types of VFs and *one single* PF, as follows:
> - primary VFs - these are in fact assigned by the system to the physical ports of the server, e.g em2p1s0f1, em2p1s0f3 etc below.
> - secondary VFs - the main purpose of these is to provide additional HW queues under SW control (usually DPDK applications) by automatically binding them to the needed physical port.
> - one single "physical" function, device 0002:01:00.0 below, which to the best of my knowledge acts merely as a stub and cannot be assigned an interface name.
> Below is the output of "dpdk-devbind.py -s" which provides some useful information.
> Network devices using DPDK-compatible driver ============================================
> 0002:01:00.2 'Device a034' drv=vfio-pci unused=nicvf
> Network devices using kernel driver
> 0000:01:10.0 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX unused=thunder_bgx,vfio-pci
> 0000:01:10.1 'THUNDERX BGX (Common Ethernet Interface)' if= drv=thunder-BGX unused=thunder_bgx,vfio-pci
> 0002:01:00.0 'THUNDERX Network Interface Controller' if= drv=thunder-nic unused=nicpf,vfio-pci
> 0002:01:00.1 'Device a034' if=em2p1s0f1 drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:00.3 'Device a034' if=em2p1s0f3 drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:00.4 'Device a034' if=em2p1s0f4 drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:00.5 'Device a034' if=em2p1s0f5 drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:00.6 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:00.7 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
> 0002:01:01.0 'Device a034' if= drv=thunder-nicvf unused=nicvf,vfio-pci
> Now for the problem. I don't have a domain definition because libvirt fails to start a domain, but I might be able to find what nova generates. But what it tries to do is passthrough em2p1s0f3, address 0002:01:00.3:
> <interface type='hostdev' managed='yes'>
> <address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x3'/>
When you use an <interface> definition, I believe libvirt is
interpreting this specifically as a network device and perhaps expects
to find an interface on the pf through which it can do setup. You can
also specify assigned devices via a <hostdev> entry, such as:
<hostdev mode='subsystem' type='pci' managed='yes'>
<address type='pci' domain='0x0002' bus='0x1' slot='0x0' function='0x3'/>
In which case libvirt shouldn't care that the device is a VF and
should have no dependency on a PF interface (or ability to configure
the VF via the PF), I think. Cc'ing libvirt experts. There's a
proposed stub driver in the upstream kernel that would also act in a
similar fashion, the host PF driver is nothing more than a stub that
enables the VFs, so libvirt would need to handle those VFs in a way
that has no dependency on the PF being a network interface, or any
other sort of interface. Thanks,
> You can find attached a trimmed libvirtd.log where the main error is:
> 43236: error : virPCIGetVirtualFunctionInfo:2927 : internal error: The PF device for VF /sys/bus/pci/devices/0002:01:00.3 has no network device name
> I have actually spent a few days trying to do some hacks and learn some more. The main idea is that virPCIGetVirtualFunctionInfo fails to find the physical name for the virtual device at address 0002:01:00.3, which as I explained in the introduction is something that this Cavium SoC does not do.
> Looking further down the stream, almost all of the helper functions need a linkdev for the physical function, which means that making libvirt work on this system means some heavy refactoring, a solution being to use the sysfs path rather than the interface name.
> This will not work 100% from what I've seen, at least virNetDevGetVfConfig uses netlink to save the admin MAC (part of virNetDevSaveNetConfig), and netlink needs the ifname.
> So I'm quite stuck on finding a workaround/fix for this platform which would potentially be something upstreamable, so that we, ENEA, don't burden with maintaining an ugly hack. Right now we are using libvirt 3.5.0 but we can upgrade to something newer if need.
> The question(s) thus, are
> 1. is this problem known in the libvirt community?
> 2. Is there any plan to make it work?
> 3. Can you give some pointers on an approach to adapt libvirt to this system?
> 4. Maybe it's worth changing the kernel to assign a sort of dummy interface to the physical function?
> Thanks and sorry for the long email,
More information about the libvir-list