[vfio-users] AMDGPU rebind kernel bug

Gary gary at mups.co.uk
Sat Aug 18 08:50:15 UTC 2018


On 18/08/18 02:25, Alex Williamson wrote:
> This one is because the GPU is still bound to a VM IOMMU domain,
> probably because the audio function is still bound to the VM and
> userspace bindings are done at the group level.  This is a user/libvirt
> error, your scenario has allowed libvirt to attempt to rebind the GPU
> to a host driver while the audio device in the same IOMMU group is
> still bound to vfio-pci and in use by the user.  Had intel-iommu not
> hit a BUG_ON, vfio would for the isolation violation.

I've switched to doing the unbind/bind manually (wrapped in a script)
rather than allowing virt-manager/libvirt to handle this and it appears
to be working. My steps are:

Unbind rx580 gpu and gpu audio
  echo -n "0000:01:00.0" > /sys/bus/pci/devices/0000:01:00.0/driver/unbind
  echo -n "0000:01:00.1" > /sys/bus/pci/devices/0000:01:00.1/driver/unbind

Bind to vfio
  echo -n "0000:01:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
  echo -n "0000:01:00.1" > /sys/bus/pci/drivers/vfio-pci/bind

Start VM, use, eject card, shutdown and unbind again as above then
rebind to rx580:

  echo -n "0000:01:00.0" > /sys/bus/pci/drivers/amdgpu/bind
  echo -n "0000:01:00.1" > /sys/bus/pci/drivers/snd_hda_intel/bind

As long as both gpu and gpu audio are first unbound, the rebind to amd
or vfio both appear to work fine.

I'll experiment with this for a while longer to make sure it's stable.
Then I'll take a look at libvirt and see about filing a bug report to
see if anything can be done to address the order of rebinding to avoid
the original lock-up.

Thank you for your help,

Gary




More information about the vfio-users mailing list