[vfio-users] CPU stall on dynamic rebinding with nvidia proprietary drivers

Alex John alex at stty.io
Thu Jul 26 10:56:48 UTC 2018


Hello!

I was experimenting with dynamically rebinding my GPU (nvidia <-> vfio-pci) and
it works exactly two times and crashes on the third time. More details and
kernel logs as follows:

I boot the system using the 1080 GTX as the boot GPU, X starts fine, everything
is useable. When I need to boot up one of VMs I

  * first kill X server, and wait for it to completely shut down
  * unbind the device from the nvidia driver
  * bind it to vfio-pci
  * do the same for the HD audio device
  * unbind the framebuffer device by doing
      echo "efi-framebuffer.0" > \
        /sys/bus/platform/drivers/efi-framebuffer/unbind
  * restart X with a different configuration file that starts it on the intel
    iGPU (i965)

This works fine. I get a vtconsole that is modesetted by the intel driver while
I'm working in the iGPU. Once done, I kill X again, rebind the card to the
nvidia driver and start X on it. All good uptil this point. However, I've lost
virtual console at this point and if I try to drop to it using Ctrl+Alt+F1 etc
my CPU stalls. The relevant snipped portion of the log can be found below.

The full log is also at: https://bpaste.net/show/0f80d62444df

If anyone has encountered this before any input would be appreciated. Thank you!

Alex

Log (L980 onwards):

 nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
 efifb: probing for efifb
 efifb: framebuffer at 0xd1000000, using 3072k, total 3072k
 efifb: mode is 1024x768x32, linelength=4096, pages=1
 efifb: scrolling: redraw
 efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
 Console: switching to colour frame buffer device 128x48
 fb0: EFI VGA frame buffer device
 nvidia-modeset: Allocated GPU:0 (GPU-43b5e24d-977e-4bb0-daf3-78b52338da5e) @ PCI:0000:01:00.0
 INFO: rcu_sched self-detected stall on CPU
 	1-....: (20999 ticks this GP) idle=ef6/1/4611686018427387906 softirq=4009/4009 fqs=5249 
 	 (t=21000 jiffies g=4392 c=4391 q=3580)
 NMI backtrace for cpu 1
 CPU: 1 PID: 5219 Comm: X Tainted: P           O      4.17.9-gentoo #2
 Hardware name: Micro-Star International Co., Ltd. MS-7B48/Z370-A PRO (MS-7B48), BIOS 2.40 03/08/2018
 Call Trace:
  <IRQ>
  dump_stack+0x46/0x5b
  nmi_cpu_backtrace+0xb3/0xc0
  ? lapic_can_unplug_cpu+0x90/0x90
  nmi_trigger_cpumask_backtrace+0x82/0xc0
  rcu_dump_cpu_stacks+0x90/0xbe
  rcu_check_callbacks+0x61f/0x870
  ? tick_sched_do_timer+0x50/0x50
  update_process_times+0x23/0x50
  tick_sched_handle+0x2f/0x40
  tick_sched_timer+0x32/0x70
  __hrtimer_run_queues+0xf5/0x260
  hrtimer_interrupt+0xe0/0x240
  smp_apic_timer_interrupt+0x5d/0x120
  apic_timer_interrupt+0xf/0x20
  </IRQ>
 RIP: 0010:os_io_read_dword+0x3/0x10 [nvidia]
 RSP: 0018:ffff9a4341dd3990 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
 RAX: 00000000ffffffff RBX: ffff8ec5ba56dc70 RCX: 0000000000000002
 RDX: 000000000000e00c RSI: 00000000000c0000 RDI: 000000000000e00c
 RBP: ffff8ec5ba56dc38 R08: 00000000000c436a R09: 00000000000c436a
 R10: 0000000000000001 R11: ffffffffc0a42cd0 R12: 000000000000c000
 R13: 00000000000017b6 R14: ffff8ec5ba56dc74 R15: ffff8ec5ba56dc78
  ? nv_rdtsc+0x170/0x270 [nvidia]
  _nv035627rm+0x84ea/0xbd70 [nvidia]
  ? _nv001281rm+0x85/0xb0 [nvidia]
  ? _nv027329rm+0x164/0x220 [nvidia]
  ? _nv028645rm+0x3c/0xe0 [nvidia]
  ? _nv001152rm+0x345/0x430 [nvidia]
  ? _nv001009rm+0x277/0x490 [nvidia]
  ? _nv026128rm+0x4e/0x240 [nvidia]
  ? _nv031518rm+0x1ae/0x5c0 [nvidia]
  ? _nv032605rm+0x189/0x210 [nvidia]
  ? _nv031527rm+0x5a8/0x5c0 [nvidia]
  ? _nv001019rm+0xe/0x20 [nvidia]
  ? _nv009553rm+0x1808/0x1cc0 [nvidia]
  ? rm_kernel_rmapi_op+0x8d/0x150 [nvidia]
  ? nvkms_call_rm+0x36/0x50 [nvidia_modeset]
  ? _nv002306kms+0x47/0x60 [nvidia_modeset]
  ? _nv002335kms+0x49/0x90 [nvidia_modeset]
  ? _nv002371kms+0x311/0x330 [nvidia_modeset]
  ? _nv002152kms+0x4a/0x100 [nvidia_modeset]
  ? up+0xd/0x50
  ? _nv031511rm+0x10/0x30 [nvidia]
  ? _nv031512rm+0x90/0xd0 [nvidia]
  ? _nv032569rm+0x15/0x20 [nvidia]
  ? _nv024969rm+0x12e/0x140 [nvidia]
  ? _nv034068rm+0xa6/0x140 [nvidia]
  ? _nv024969rm+0x12e/0x140 [nvidia]
  ? _nv034068rm+0xa6/0x140 [nvidia]
  ? _nv002339kms+0xc49/0xde0 [nvidia_modeset]
  ? _nv002339kms+0xc81/0xde0 [nvidia_modeset]
  ? nvkms_copyin+0x5/0x20 [nvidia_modeset]
  ? nvKmsIoctl+0x117/0x760 [nvidia_modeset]
  ? __kmalloc+0xf0/0x1d0
  ? _nv002339kms+0xc60/0xde0 [nvidia_modeset]
  ? nvkms_ioctl_common+0x36/0xe0 [nvidia_modeset]
  ? nvkms_ioctl_common+0xc2/0xe0 [nvidia_modeset]
  ? nvidia_frontend_unlocked_ioctl+0x39/0x40 [nvidia]
  ? do_vfs_ioctl+0x8b/0x5e0
  ? security_file_ioctl+0x2d/0x50
  ? ksys_ioctl+0x6b/0x80
  ? __x64_sys_ioctl+0x11/0x20
  ? do_syscall_64+0x43/0xf0
  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9




More information about the vfio-users mailing list