Add options to device xml to skip reattach of pci passthrough devices.

Fri Jun 18 14:43:07 UTC 2021

On 6/16/21 4:15 PM, Daniel Henrique Barboza wrote:
> 
> 
> On 6/9/21 4:38 PM, Manish Mishra wrote:
>> Hi Everyone,
>>
>> We want to add extra options to device xml to skip reattach of pci 
>> passthrough devices. Following is xml format for pci passthrough 
>> devices added to domain as of now.
>>
>> <hostdev mode='subsystem' type='pci' managed='yes'>
>>
>>    <source>
>>
>>        <address domain='0x0000' bus='0x00' slot='0x1a' function='0x7'/>
>>
>>    </source>
>>
>> </hostdev>
>>
>> When we pass managed=’yes’ flag through xml, libvirt takes 
>> responsibility of detaching device on domain(guest VM) start and 
>> reattaching on domain shutdown. We observed some issues where guest VM 
>> shutdown may take long time, blocked for reattach operation on pci 
>> passthrough device. As domain lock is held during this time it also 
>> makes libvirt mostly inactive as it blocks even basic operations like 
>> (virsh list). Reattaching of device to host can block due to reasons 
>> like buggy driver or initialization of device itself can take long 
>> time in some cases.
> 
> I am more interested in hearing about the problem with this faulty buggy
> driver holding domain lock during device reattach and compromising 'virsh'
> operations, and see if there's something to do to mitigate that, instead
> of creating a XML workaround for a driver problem.
> 
>>
>> We want to pass following extra options to resolve this:
>>
>>  1. *skipReAttach*(optional flag)
>>
>> In some cases we do not need to reattach device to host as it may be 
>> reserved only for guests, with this flag we can skip reattach 
>> operation on host.  We do not want to modify managed flag to avoid 
>> regression, so thinking of adding new optional flag.
>>
>>  2. *reAttachDriverName*(optional flag)
>>
>> Name of driver to which we want to attach instead of default, to avoid 
>> reattaching to buggy driver. Currently libvirt asks host to auto 
>> selects driver for device.
>>
>> Yes we can use managed=’no’ but in that case user has to take 
>> responsibility of detaching device before starting domain which we do 
>> not want. Please let us know your views on this.
> 
> The case you mentioned above, "we do not need to reattach device to host
> as it may be reserved only for guests", is one of the most common uses
> we have for managed='no' AFAIK. The user/sysadm must detach the device
> from the host, but it's only one time. After that the device can remain
> detached from the host, and guests can use it freely as long as you
> don't reboot the host (or reattach the device back). This scenario
> you described fit the managed='no' mechanics fine IMO.
> 
> If you want to automate the detach process, you can use a Libvirt QEMU
> hook (/etc/libvirt/hooks/qemu) to make the device detach when starting
> the domain, in case the device isn't already detached. Note that
> this has the same effect of the "skipReAttach" option you proposed.
> 
> Making a design around faulty drivers isn't ideal. If the driver you're
> using starts to have problems with the detach operation as well, 
> 'skipReAttach'
> will do you no good. You'll have to fall back to 'managed=no' to circumvent
> that.
> 
> Even if we discard the motivation, I'm not sure about the utility of having
> more forms of PCI assignment management (e.g 
> managed=yes|no|detach|reattach).
> managed=yes|no seems to cover most use cases where the device driver works
> properly.
> 
> 
> Laine, what do you think?

I have a vague memory of someone (may even have been me) proposing 
something similar several years ago, and the idea was shot down. I don't 
remember the exact context or naming, but the general idea was to have 
something like managed='detach-only' in order to have the advantage of 
all configuration being within libvirt, but eliminating the potential 
bad behavior associated with repeated re-binding of devices to drivers. 
I unfortunately also don't recall the reason the idea was nixed. Dan or 
Alex - do either of you have any memory of this?

As for myself, 1) I agree with Daniel's suggestion that it is important 
to find the root cause of the long delay rather than just covering it up 
with another obscure option that will need to be re-discovered by anyone 
who encounters the problem, and 2) every new bit that we add in makes 
the code more complex and so more prone to errors, and also makes 
configuration more complex and also more prone to errors. So while new 
options like this could be useful, they could also be a net loss (or 
not, it's hard to know without actually doing it, but once it's done it 
can't be "un-done" :-))