multiple vms with same PCI passthrough

Mon Aug 17 15:00:10 UTC 2020

On 8/8/20 11:53 PM, Daniel Black wrote:
> 
> In attempting to isolate vfio-pci problems between two different guest 
> instances, the creation of a second guest (with existing guest shutdown) 
> resulted in:.
> 
> Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 
> is already in use
> Aug 09 12:43:23 grit libvirtd[6716]: internal error: Device 0000:01:00.3 
> is already in use
> Aug 09 12:43:23 grit libvirtd[6716]: Failed to allocate PCI device list: 
> internal error: Device 0000:01:00.3 is already in use

Hmm. Normally the error that would be logged if a device is already in 
use would say something like this:

error: Failed to start domain Win10-GPU
error: Requested operation is not valid: PCI device 0000:05:00.0 is in
        use by driver QEMU, domain F30

So you're encountering this in an unexpected place.

> 
> Compiled against library: libvirt 6.1.0
> Using library: libvirt 6.1.0
> Using API: QEMU 6.1.0
> Running hypervisor: QEMU 4.2.1
> (fc32 default install)
> 
> The upstream code seems  also to test definitions rather than active 
> uses of the PCI device.

That isn't the case. You're misunderstanding what devices are on the 
list. (see below for details)

> 
> My potentially naive patch to correct this (but not the failing test 
> cases) would be:
> 
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 47c671daa0..a00c5e6f44 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -1597,7 +1597,7 @@ int
>   virPCIDeviceListAdd(virPCIDeviceListPtr list,
>                       virPCIDevicePtr dev)
>   {
> -    if (virPCIDeviceListFind(list, dev)) {
> +    if (virPCIDeviceBusContainsActiveDevices(dev, list)) {
>           virReportError(VIR_ERR_INTERNAL_ERROR,
>                          _("Device %s is already in use"), dev->name);
>           return -1;
> 
> Is this too simplistic or undesirable a feature request/implementation?

Only devices that are currently in use by a guest (activePCIHostdevs), 
or that libvirt is in the process of detaching from the guest + vfio and 
rebinding to the device's host driver (inactivePCIHostdevs) are on 
either list of PCI devices maintained by libvirt. Once a device is 
completely detached from the guest and (if "managed='yes'" was set in 
the XML config) re-binded to the natural host driver for the device, it 
is removed from the list and can be used elsewhere.

I just tested this with an assigned GPU + soundcard on two guests to 
verify that it works properly. (I'm running the latest upstream master 
though, so it's not an exact replication of your test)

> 
> I'd be more than grateful if someone carries this through as I'm unsure 
> when I may get time for this.

Can you provide the XML for your <hostdev> in the two guests, and the 
exact sequence of commands that lead to this error? There is definitely 
either a bug in the code, or a bug in what you're doing. By seeing the 
sequence of events, we can either attempt to replicate it, or let you 
know what change you need to make to your workflow to eliminate the error.