[libvirt] [RFC PATCH] hostdev: add support for "managed='detach'"

Tue Mar 15 18:21:35 UTC 2016

On 03/15/2016 01:00 PM, Daniel P. Berrange wrote:
> On Mon, Mar 14, 2016 at 03:41:48PM -0400, Laine Stump wrote:
>> Suggested by Alex Williamson.
>>
>> If you plan to assign a GPU to a virtual machine, but that GPU happens
>> to be the host system console, you likely want it to start out using
>> the host driver (so that boot messages/etc will be displayed), then
>> later have the host driver replaced with vfio-pci for assignment to
>> the virtual machine.
>>
>> However, in at least some cases (e.g. Intel i915) once the device has
>> been detached from the host driver and attached to vfio-pci, attempts
>> to reattach to the host driver only lead to "grief" (ask Alex for
>> details). This means that simply using "managed='yes'" in libvirt
>> won't work.
>>
>> And if you set "managed='no'" in libvirt then either you have to
>> manually run virsh nodedev-detach prior to the first start of the
>> guest, or you have to have a management application intelligent enough
>> to know that it should detach from the host driver, but never reattach
>> to it.
>>
>> This patch makes it simple/automatic to deal with such a case - it
>> adds a third "managed" mode for assigned PCI devices, called
>> "detach". It will detach ("unbind" in driver parlance) the device from
>> the host driver prior to assigning it to the guest, but when the guest
>> is finished with the device, will leave it bound to vfio-pci. This
>> allows re-using the device for another guest, without requiring
>> initial out-of-band intervention to unbind the host driver.
> You say that managed=yes causes pain upon re-attachment and that
> apps should use managed=detach to avoid it, but how do management
> apps know which devices are going to cause pain ? Libvirt isn't
> providing any info on whether a particular device id needs to
> use managed=yes vs managed=detach, and we don't want to be asking
> the user to choose between modes in openstack/ovirt IMHO. I think
> thats a fundamental problem with inventing a new value for managed
> here.

My suspicion is that in many/most cases users don't actually need for 
the device to be re-bound to the host driver after the guest is finished 
with it, because they're only going to use the device to assign to a 
different guest anyway. But because managed='yes' is what's supplied and 
is the easiest way to get it setup for assignment to a guest, that's 
what they use.

As a matter of fact, all this extra churn of changing the driver back 
and forth for devices that are only actually used when they're bound to 
vfio-pci just wastes time, and makes it more likely that libvirt and its 
users will reveal and get caught up in the effects of some strange 
kernel driver loading/unloading bug (there was recently a bug reported 
like this; unfortunately the BZ record had customer info in it, so it's 
not publicly accessible :-( )

So beyond making this behavior available only when absolutely necessary, 
I think it is useful in other cases, at the user's discretion (and as I 
implied above, I think that if they understood the function and the 
tradeoffs, most people would choose to use managed='detach' rather than 
managed='yes')

(alternately, we could come back to the discussion of having persistent 
nodedevice config, with one of the configurables being which devices 
should be bound to vfio-pci when libvirtd is started. Did we maybe even 
talk about exactly that in the past? I can't remember... That would of 
course preclude the use case where someone 1) normally wanted to use the 
device for the host, but 2) occasionally wanted to use it for a guest, 
after which 3) they were well aware that they would need to reboot the 
host before they could use the device on the host again. I know, I know 
- "odd edge cases", and in particular "odd edge cases only encountered 
by people who know other ways of working around the problem" :-))

> Can you provide more details about the problems with detaching ?
>
> Is this inherant to all VGA cards, or is it specific to the Intel
> i915, or specific to a kernel version or something else ?
>
> I feel like this is something where libvirt should "do the right
> thing", since that's really what managed=yes is all about.
>
> eg, if we have managed=yes and we see an i915, we should
> automatically skip re-attach for that device.

Alex can give a much better description of that than I can (I had told 
git to Cc him on the original patch, but it seems it didn't do that; I'm 
trying again). But what if there is such a behavior now for a certain 
set of VGA cards, and it gets fixed in the future? Would we continue to 
force avoiding re-attach for the device? I understand the allure of 
always doing the right thing without requiring config (and the dislike 
of adding new seemingly esoteric options), but I don't know that libvirt 
has (or can get) the necessary info to make the correct decision in all 
cases.