[vfio-users] CPU stall on dynamic rebinding with nvidia proprietary drivers
Alex John
alex at stty.io
Thu Jul 26 10:56:48 UTC 2018
Hello!
I was experimenting with dynamically rebinding my GPU (nvidia <-> vfio-pci) and
it works exactly two times and crashes on the third time. More details and
kernel logs as follows:
I boot the system using the 1080 GTX as the boot GPU, X starts fine, everything
is useable. When I need to boot up one of VMs I
* first kill X server, and wait for it to completely shut down
* unbind the device from the nvidia driver
* bind it to vfio-pci
* do the same for the HD audio device
* unbind the framebuffer device by doing
echo "efi-framebuffer.0" > \
/sys/bus/platform/drivers/efi-framebuffer/unbind
* restart X with a different configuration file that starts it on the intel
iGPU (i965)
This works fine. I get a vtconsole that is modesetted by the intel driver while
I'm working in the iGPU. Once done, I kill X again, rebind the card to the
nvidia driver and start X on it. All good uptil this point. However, I've lost
virtual console at this point and if I try to drop to it using Ctrl+Alt+F1 etc
my CPU stalls. The relevant snipped portion of the log can be found below.
The full log is also at: https://bpaste.net/show/0f80d62444df
If anyone has encountered this before any input would be appreciated. Thank you!
Alex
Log (L980 onwards):
nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
efifb: probing for efifb
efifb: framebuffer at 0xd1000000, using 3072k, total 3072k
efifb: mode is 1024x768x32, linelength=4096, pages=1
efifb: scrolling: redraw
efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Console: switching to colour frame buffer device 128x48
fb0: EFI VGA frame buffer device
nvidia-modeset: Allocated GPU:0 (GPU-43b5e24d-977e-4bb0-daf3-78b52338da5e) @ PCI:0000:01:00.0
INFO: rcu_sched self-detected stall on CPU
1-....: (20999 ticks this GP) idle=ef6/1/4611686018427387906 softirq=4009/4009 fqs=5249
(t=21000 jiffies g=4392 c=4391 q=3580)
NMI backtrace for cpu 1
CPU: 1 PID: 5219 Comm: X Tainted: P O 4.17.9-gentoo #2
Hardware name: Micro-Star International Co., Ltd. MS-7B48/Z370-A PRO (MS-7B48), BIOS 2.40 03/08/2018
Call Trace:
<IRQ>
dump_stack+0x46/0x5b
nmi_cpu_backtrace+0xb3/0xc0
? lapic_can_unplug_cpu+0x90/0x90
nmi_trigger_cpumask_backtrace+0x82/0xc0
rcu_dump_cpu_stacks+0x90/0xbe
rcu_check_callbacks+0x61f/0x870
? tick_sched_do_timer+0x50/0x50
update_process_times+0x23/0x50
tick_sched_handle+0x2f/0x40
tick_sched_timer+0x32/0x70
__hrtimer_run_queues+0xf5/0x260
hrtimer_interrupt+0xe0/0x240
smp_apic_timer_interrupt+0x5d/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
RIP: 0010:os_io_read_dword+0x3/0x10 [nvidia]
RSP: 0018:ffff9a4341dd3990 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 00000000ffffffff RBX: ffff8ec5ba56dc70 RCX: 0000000000000002
RDX: 000000000000e00c RSI: 00000000000c0000 RDI: 000000000000e00c
RBP: ffff8ec5ba56dc38 R08: 00000000000c436a R09: 00000000000c436a
R10: 0000000000000001 R11: ffffffffc0a42cd0 R12: 000000000000c000
R13: 00000000000017b6 R14: ffff8ec5ba56dc74 R15: ffff8ec5ba56dc78
? nv_rdtsc+0x170/0x270 [nvidia]
_nv035627rm+0x84ea/0xbd70 [nvidia]
? _nv001281rm+0x85/0xb0 [nvidia]
? _nv027329rm+0x164/0x220 [nvidia]
? _nv028645rm+0x3c/0xe0 [nvidia]
? _nv001152rm+0x345/0x430 [nvidia]
? _nv001009rm+0x277/0x490 [nvidia]
? _nv026128rm+0x4e/0x240 [nvidia]
? _nv031518rm+0x1ae/0x5c0 [nvidia]
? _nv032605rm+0x189/0x210 [nvidia]
? _nv031527rm+0x5a8/0x5c0 [nvidia]
? _nv001019rm+0xe/0x20 [nvidia]
? _nv009553rm+0x1808/0x1cc0 [nvidia]
? rm_kernel_rmapi_op+0x8d/0x150 [nvidia]
? nvkms_call_rm+0x36/0x50 [nvidia_modeset]
? _nv002306kms+0x47/0x60 [nvidia_modeset]
? _nv002335kms+0x49/0x90 [nvidia_modeset]
? _nv002371kms+0x311/0x330 [nvidia_modeset]
? _nv002152kms+0x4a/0x100 [nvidia_modeset]
? up+0xd/0x50
? _nv031511rm+0x10/0x30 [nvidia]
? _nv031512rm+0x90/0xd0 [nvidia]
? _nv032569rm+0x15/0x20 [nvidia]
? _nv024969rm+0x12e/0x140 [nvidia]
? _nv034068rm+0xa6/0x140 [nvidia]
? _nv024969rm+0x12e/0x140 [nvidia]
? _nv034068rm+0xa6/0x140 [nvidia]
? _nv002339kms+0xc49/0xde0 [nvidia_modeset]
? _nv002339kms+0xc81/0xde0 [nvidia_modeset]
? nvkms_copyin+0x5/0x20 [nvidia_modeset]
? nvKmsIoctl+0x117/0x760 [nvidia_modeset]
? __kmalloc+0xf0/0x1d0
? _nv002339kms+0xc60/0xde0 [nvidia_modeset]
? nvkms_ioctl_common+0x36/0xe0 [nvidia_modeset]
? nvkms_ioctl_common+0xc2/0xe0 [nvidia_modeset]
? nvidia_frontend_unlocked_ioctl+0x39/0x40 [nvidia]
? do_vfs_ioctl+0x8b/0x5e0
? security_file_ioctl+0x2d/0x50
? ksys_ioctl+0x6b/0x80
? __x64_sys_ioctl+0x11/0x20
? do_syscall_64+0x43/0xf0
? entry_SYSCALL_64_after_hwframe+0x44/0xa9
More information about the vfio-users
mailing list