[libvirt] qapi DEVICE_DELETED event issued *before* instance_finalize?!

Alex Williamson alex.williamson at redhat.com
Thu Sep 1 23:11:48 UTC 2016


Hey,

I'm out of my QOM depth, so I'll just beg for help in advance.  I
noticed in testing vfio-pci hotunplug that the host seems to be trying
to reclaim the device before QEMU is actually done with it, there's a
very short race where libvirt has seen the DEVICE_DELETED event and
tries to unbind the physical device from vfio-pci, the use count is
clearly non-zero because the host driver tries to send a device
request, but that event channel has already been torn down.  Nearly
immediately after, QEMU finally releases the device, but we can't do a
proper reset due to some issues with device references in the kernel.

When I run gdb on QEMU with breakpoints at
qapi_event_send_device_deleted() and vfio_instance_finalize(),  the
QAPI even happens first.  Clearly this is horribly wrong, right?  I
can't unmap my references to the vfio device file until my
instance_finalize is called, so I'm always going to have that open when
libvirt takes the DEVICE_DELETED event as a cue to return the device to
host drivers.  The call chains look like this:

#0  qapi_event_send_device_deleted (has_device=true, 
    device=0x7f5ca3e36fb0 "hostdev0", 
    path=0x7f5c89e84fe0 "/machine/peripheral/hostdev0", 
    errp=0x7f5ca241f9e8 <error_abort>) at qapi-event.c:412
#1  0x00007f5ca1701608 in device_unparent (obj=0x7f5ca43ffc00)
    at hw/core/qdev.c:1115
#2  0x00007f5ca18b7891 in object_finalize_child_property (obj=0x7f5ca380f500, 
    name=0x7f5ca3f21da0 "hostdev0", opaque=0x7f5ca43ffc00) at qom/object.c:1362
#3  0x00007f5ca18b56b2 in object_property_del_child (obj=0x7f5ca380f500, 
    child=0x7f5ca43ffc00, errp=0x0) at qom/object.c:422
#4  0x00007f5ca18b5790 in object_unparent (obj=0x7f5ca43ffc00)
    at qom/object.c:441
#5  0x00007f5ca16c1f31 in acpi_pcihp_eject_slot (s=0x7f5ca4c41268, bsel=0, 
    slots=4) at hw/acpi/pcihp.c:139


#0  vfio_instance_finalize (obj=0x7f5ca43ffc00)
    at /net/gimli/home/alwillia/Work/qemu.git/hw/vfio/pci.c:2731
#1  0x00007f5ca18b57c0 in object_deinit (obj=0x7f5ca43ffc00, 
    type=0x7f5ca376f490) at qom/object.c:448
#2  0x00007f5ca18b5831 in object_finalize (data=0x7f5ca43ffc00)
    at qom/object.c:462
#3  0x00007f5ca18b6782 in object_unref (obj=0x7f5ca43ffc00) at qom/object.c:896
#4  0x00007f5ca1550cc0 in memory_region_unref (mr=0x7f5ca43fff00)
    at /net/gimli/home/alwillia/Work/qemu.git/memory.c:1476
#5  0x00007f5ca1553886 in do_address_space_destroy (as=0x7f5ca43ffe10)
    at /net/gimli/home/alwillia/Work/qemu.git/memory.c:2272


It appears that DEVICE_DELETED only means the VM is done with the
device but libvirt is interpreting it as QEMU is done with the device.
Which is correct?  Do we need a new event or do we need to fix the
ordering of this event?  An ordering fix would be more compatible with
existing libvirt.  Thanks,

Alex




More information about the libvir-list mailing list