[vfio-users] lspci and vfio_pci_release deadlock when destroy a pci passthrough VM

Wuzongyong (Euler Dept) cordius.wu at huawei.com
Wed Mar 20 13:32:33 UTC 2019


Hi Alex,

I notice a patch you pushed in https://lkml.org/lkml/2019/2/18/1315
You said the previous commit you pushed may prone to deadlock, could you please share the details about how to reproduce the deadlock scene if you know it.
I met a similar question that all lspci command went into D state and libvirtd went into Z state when destroy a VM with a GPU passthrou. The stack like that:

2019-03-20T13:37:14.726514+07:00|err|kernel[-]|[2427373.553663] INFO: task ps:112058 blocked for more than 120 seconds.
2019-03-20T13:37:14.726576+07:00|err|kernel[-]|[2427373.553667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726599+07:00|info|kernel[-]|[2427373.553669] ps              D 0000000000000000     0 112058      1 0x00000004
2019-03-20T13:37:14.726620+07:00|warning|kernel[-]|[2427373.553673] Call Trace:
2019-03-20T13:37:14.726640+07:00|warning|kernel[-]|[2427373.553682]  [<ffffffff816b7069>] schedule_preempt_disabled+0x29/0x70
2019-03-20T13:37:14.726668+07:00|warning|kernel[-]|[2427373.553684]  [<ffffffff816b4a21>] __mutex_lock_slowpath+0xe1/0x170
2019-03-20T13:37:14.726689+07:00|warning|kernel[-]|[2427373.553689]  [<ffffffff816b400f>] mutex_lock+0x1f/0x2f
2019-03-20T13:37:14.726707+07:00|warning|kernel[-]|[2427373.553695]  [<ffffffff81379337>] pci_bus_save_and_disable+0x37/0x70
2019-03-20T13:37:14.726725+07:00|warning|kernel[-]|[2427373.553697]  [<ffffffff8137aeb8>] pci_try_reset_bus+0x38/0x80
2019-03-20T13:37:14.726743+07:00|warning|kernel[-]|[2427373.553730]  [<ffffffffa0261045>] vfio_pci_release+0x3d5/0x430 [vfio_pci]
2019-03-20T13:37:14.726761+07:00|warning|kernel[-]|[2427373.553737]  [<ffffffffa0260640>] ? vfio_pci_rw+0xc0/0xc0 [vfio_pci]
2019-03-20T13:37:14.726779+07:00|warning|kernel[-]|[2427373.553745]  [<ffffffffa02529f2>] vfio_device_fops_release+0x22/0x40 [vfio]
2019-03-20T13:37:14.726798+07:00|warning|kernel[-]|[2427373.553751]  [<ffffffff812179dc>] __fput+0xec/0x260
2019-03-20T13:37:14.726821+07:00|warning|kernel[-]|[2427373.553754]  [<ffffffff81217c8e>] ____fput+0xe/0x10
2019-03-20T13:37:14.726840+07:00|warning|kernel[-]|[2427373.553758]  [<ffffffff810b684a>] task_work_run+0xaa/0xe0
2019-03-20T13:37:14.726858+07:00|warning|kernel[-]|[2427373.553763]  [<ffffffff8102ac12>] do_notify_resume+0x92/0xb0
2019-03-20T13:37:14.726876+07:00|warning|kernel[-]|[2427373.553767]  [<ffffffff816c264f>] int_signal+0x12/0x17
2019-03-20T13:37:14.726892+07:00|err|kernel[-]|[2427373.553771] INFO: task lspci:139540 blocked for more than 120 seconds.
2019-03-20T13:37:14.726910+07:00|err|kernel[-]|[2427373.553772] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-03-20T13:37:14.726929+07:00|info|kernel[-]|[2427373.553773] lspci           D 0000000000000000     0 139540 139539 0x00000000
2019-03-20T13:37:14.726948+07:00|warning|kernel[-]|[2427373.553776] Call Trace:
2019-03-20T13:37:14.726970+07:00|warning|kernel[-]|[2427373.553778]  [<ffffffff816b5f79>] schedule+0x29/0x70
2019-03-20T13:37:14.726989+07:00|warning|kernel[-]|[2427373.553782]  [<ffffffff81370ca0>] pci_wait_cfg+0xa0/0x110
2019-03-20T13:37:14.727006+07:00|warning|kernel[-]|[2427373.553787]  [<ffffffff810cfe40>] ? wake_up_state+0x20/0x20
2019-03-20T13:37:14.727023+07:00|warning|kernel[-]|[2427373.553790]  [<ffffffff81370e15>] pci_user_read_config_dword+0x105/0x110
2019-03-20T13:37:14.727043+07:00|warning|kernel[-]|[2427373.553794]  [<ffffffff8137e974>] pci_read_config+0x114/0x2c0
2019-03-20T13:37:14.727063+07:00|warning|kernel[-]|[2427373.553799]  [<ffffffff811f4835>] ? __kmalloc+0x55/0x240
2019-03-20T13:37:14.727084+07:00|warning|kernel[-]|[2427373.553804]  [<ffffffff812992fe>] read+0xde/0x1f0
2019-03-20T13:37:14.727103+07:00|warning|kernel[-]|[2427373.553807]  [<ffffffff81215a5f>] vfs_read+0x9f/0x170
2019-03-20T13:37:14.727123+07:00|warning|kernel[-]|[2427373.553809]  [<ffffffff81216812>] SyS_pread64+0x92/0xc0
2019-03-20T13:37:14.727141+07:00|warning|kernel[-]|[2427373.553812]  [<ffffffff816c22ef>] system_call_fastpath+0x1c/0x21

It seems that lspci and vfio_pci_release are in deadlock.

Thanks,
Zongyong Wu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20190320/90583451/attachment.htm>


More information about the vfio-users mailing list