[PATCH RESEND 06/20] virhostdev.c: virHostdevGetPCIHostDevice() now reports missing device

Laine Stump laine at redhat.com
Mon Feb 22 03:32:13 UTC 2021


On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote:
> Gitlab issue #72 [1] reports that removing SR-IOVs VFs before
> removing the devices from the running domains can have strange
> consequences. QEMU might be able to hotunplug the device inside the
> guest, but Libvirt will not be aware of that, and then the guest is
> now inconsistent with the domain definition.
> 
> There's also the possibility of the VFs removal not succeeding
> while the domain is running but then, as soon as the domain
> is shutdown, all the VFs are removed. Libvirt can't handle
> the removal of the PCI devices while trying to reattach the
> hostdevs, and the Libvirt daemon can be left in an inconsistent
> state (see [2]).
> 
> This patch starts to address the issue related in Gitlab #72, most
> notably the issue described in [2]. When shutting down a domain
> with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices()
> is failing the whole process and failing to reattach all the
> PCI devices, including the ones that aren't related to the VFs that
> went missing. Let's make it more resilient with host changes by
> changing virHostdevGetPCIHostDevice() to return an exclusive error
> code '-2' for this case. virHostdevGetPCIHostDeviceList() can then
> tell when virHostdevGetPCIHostDevice() failed to find the PCI
> device of a hostdev and continue to make the list of PCI devices.
> 
> virHostdevReAttachPCIDevices() will now be able to proceed reattaching
> all other valid PCI devices, at least. The 'ghost hostdevs' will be
> handled later on.
> 
> [1] https://gitlab.com/libvirt/libvirt/-/issues/72
> [2] https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
> ---
>   src/hypervisor/virhostdev.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
> index bd35397f2c..dbba36193b 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void)
>    * is returned.
>    *
>    * Returns: 0 on success (@pci might be NULL though),
> - *         -1 otherwise (with error reported).
> + *         -1 otherwise (with error reported),
> + *         -2 PCI device not found. @pci will be NULL
>    */
>   static int
>   virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
> @@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
>           hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
>           return 0;
>   
> +    if (!virPCIDeviceExists(&pcisrc->addr))
> +        return -2;

You've created a special return code to mean "This is a PCI device, but 
it isn't present on the host"...

> +
>       actual = virPCIDeviceNew(&pcisrc->addr);
>   
>       if (!actual)
> @@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)
>           virDomainHostdevDefPtr hostdev = hostdevs[i];
>           g_autoptr(virPCIDevice) pci = NULL;
>   
> -        if (virHostdevGetPCIHostDevice(hostdev, &pci) < 0)
> +        if (virHostdevGetPCIHostDevice(hostdev, &pci) == -1)
>               return NULL;

...But you haven't actually used it. Will it become necessary later to 
have this special value? right now a missing device is treated exactly 
the same as if it isn't a PCI device.

I guess I can understand the conceptual desire to return an error of 
some type in this case though, and there are other places where 
something similar is done (-2 indicates some type of "odd" error), so 
I'll let it pass :-)


Reviewed-by: Laine Stump <laine at redhat.com>

>   
>           if (!pci)
> 




More information about the libvir-list mailing list