[libvirt] [RFC] handling hostdev save/load net config for non SR-IOV devices

Laine Stump laine at laine.org
Thu Jul 18 17:18:43 UTC 2019


On 7/18/19 11:56 AM, Daniel Henrique Barboza wrote:
>
>
> On 7/18/19 12:29 PM, Laine Stump wrote:
>> On 7/18/19 10:29 AM, Daniel Henrique Barboza wrote:
>>> Hi,
>>>
>>> I have a PoC that enables partial coldplug assignment of multifunction
>>> PCI devices with managed mode. At this moment, Libvirt can't handle
>>> this scenario - the code will detach only the hostdevs from the XML,
>>> when in fact the whole IOMMU needs to be detached. This can be
>>> verified by the fact that Libvirt handles the unmanaged scenario
>>> well, as long as the user detaches the whole IOMMU beforehand.
>>>
>>> I have played with 2 approaches. The one I am planning to contribute
>>> back is a change inside virHostdevGetPCIHostDeviceList(), that
>>> adds the extra PCI devices for detach/re-attach in case a PCI
>>> Multifunction device in managed mode is presented in the XML.
>>
>>
>> If you're thinking of doing that automatically, then I should warn 
>> you that we had discussed that a long time ago, and decided that it 
>> was a bad idea to do it because it was likely someone would, e.g. try 
>> to assign an audio device to their guest that happened to be one 
>> function on a multifunction device that also contained a disk 
>> controller (or some other device) that the host needed for proper 
>> operation.
>>
>>
>
> Let's say that I have a Multi PCI card with 4 functions, and I want a 
> guest to use
> only the function 0 of that card. At this moment, I'm only able to do 
> that if I
> manually execute nodedev-detach on all 4 functions beforehand and use 
> function
> 0 as a hostdev with managed=false.
>
> What I've implemented is a way of doing the detach/re-attach of the 
> whole IOMMU
> for the user, if the hostdev is set with managed=true (and perhaps I 
> should also
> consider verifying the 'multifunction=yes' attribute as well, for more 
> clarity).
> I am not trying to assign all the IOMMU devices to the guest - not 
> sure if that's
> what you were talking about up there, but I'm happy to emphasize 
> that's not
> the case.


No, we're talking about the same thing. We specifically talked about the 
possibility of doing exactly this several years ago, and decided against it.


>
> Now, yes, if the user is unaware of the consequences of detaching all 
> devices
> of the IOMMU from the host, bad things can happen. If that's what 
> you're saying,
> fair enough. I can make an argument about how we can't shield the user 
> from
> his/her own 'unawareness' forever, but in the end it's better to be on 
> the safe
> side.


We really shouldn't do anything with any host device if it's not 
explicitly given in the config.


>
>
>> It may be that in *your* particular case, you understand that the 
>> functions you don't want to assign to the guest are not otherwise 
>> used, and it's not dangerous to suddenly detach them from their host 
>> driver. But you can't assume that will always be the case.
>>
>>
>> If you *really* can't accept just assigning all the devices in that 
>> IOMMU group to the guest (thus making them all explicitly listed in 
>> the config, and obvious to the administrator that they won't be 
>> available on the host) and simply not using them, then you either 
>> need to separately detach those particular functions from the host, 
>> or come up with a way of having the domain config explicitly list 
>> them as "detached from the host but not actually attached to the guest".
>>
>
> I can live with that - it will automate the detach/re-attach process, 
> which is
> my goal here, and it force the user to know exactly what is going to 
> be detached
> from the host, minimizing errors. If no one is against adding an extra
> parameter 'unassigned=true' to the hostdev in these cases, I can make this
> happen.


I don't have any idealogical opinion against that (maybe there's a 
better name for the attribute, but I can't think of it).


But to back up a bit - what is it about managed='yes' that makes you 
want to do it that way instead of managed='no'? Do you really ever need 
the devices to be binded to the host driver? Or are you just using 
managed='yes' because there's not a standard/concenient place to 
configure devices to be permanently binded to vfio-pci immediately when 
the host boots? Truthfully, a great majority of the most troubling bugs 
with device assignment are due to use of managed='yes', since it 
exercises the kernel's device driver binding/unbinding code so much, and 
reveals strange races in the (usually kernel) code, but in almost all 
cases the devices being assigned to guests are *never* used directly by 
the host anyway, so there is no point in repeatedly rebinding the host 
driver to the device - it just sits there unused [1] until the next time 
it is needed by a guest, and at that time it gets rebinded to vfio-pci, 
rinse, repeat.


I think we should spend more time making it easier to have devices 
"pre-binded" to vfio-pci at boot time, so that we could discourage use 
of managed='yes'. (not "instead of" what you're doing, but "in addition 
to" it).


[1] (in the case of network device VFs, often it isn't "unused", but 
instead is *improperly* used on the host due to NetworkManager insisting 
on setting the device IFF_UP and starting up a DHCP client. So it's not 
just finding races in the kernel driver binding/initialization code, but 
also falling prey to (imho) the poor choice of NM to force all 
interfaces up and default to running dhcp on all unconfigured interfaces)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190718/3f432798/attachment-0001.htm>


More information about the libvir-list mailing list