[libvirt] [PATCH]execute netdev_del after receive DEVICE_DELETED event

Laine Stump lstump at redhat.com
Fri Mar 28 15:21:57 UTC 2014


On 03/28/2014 01:30 PM, xiexiangyou wrote:
> Thanks for your reply.
>
> On 2014/3/27 22:14, Jiri Denemark wrote:
>
>> On Thu, Mar 27, 2014 at 20:51:24 +0800, x00221466 wrote:
>>> Hi,
>>>
>>> When live detaching the virtual net device, such as virtio nic、
>>> RTL8139、E1000, there are some problems:
>>>
>>> (1)If the Guest OS don't support the hot plugging pci device, detach
>>> the virtual network device by Libvirt, the "net device" in Qemu will
>>> still exist, but "hostnet"(tap) in Qemu will be removed. so the net device
>>> in Guest OS will be of no effect.
>>>
>>> (2)If reject the nic in Guest OS, Qemu will remove the "net device",
>>> then Qemu send DEVICE_DELETED to Libvirt, Libvirt receive the event
>>> in event-loop thread and release info of the net device in
>>> qemuDomainRemoveNetDevice func. but "hostnet" in Qemu still exist.
>>> So next live attaching virtual net device will be failed because of
>>> "Duplicate ID".
>>>
>>> #virsh attach-device win2008_st_r2_64 net.xml --live
>>> error: Failed to attach device from net.xml
>>> error: internal error: unable to execute QEMU command 'netdev_add':
>>> Duplicate ID 'hostnet0' for netdev
>>>
>>> (3)In addition, in qemuDomainDetachNetDevice, detach net device func,
>>> "netdev_del" command will be sent after sending "device_del" command
>>> at once. So it is violent to remove the tap device before the net device
>>> is completely removed.
>>>
>>> So I think it's more logical that doing the work of sending Qemu command
>>> "netdev_del" after receive the DEVICE_DELETED event. It can avoid the conflict
>>> of device info between Libvirt side and Qemu side.
>> This sounds like it could be correct, although I'd prefer Laine to
>> express his opinion on this since he knows the corners in network device
>> assignment...
>>
>>> I create a thread in qemuDomainRemoveDevice,the handle of DEVICE_DELETED event,
>>> to execute QEMU command "netdev_del".
>> Hmm, it took me some time to realize why you'd need to do this. It's
>> because qemuDomainRemoveDevice is run from a DEVICE_DELETED event
>> handler and thus it cannot talk back to the monitor, right? In that
>
> Yep! Sending the Qemu monitor command in event handler is no allowed, so I create
> a new thread to do this.
>
>> case, I suggest spawning a thread for qemuDomainRemoveDevice itself
>> within the event handler (qemuProcessHandleDeviceDeleted) so that all
>> qemuDomainRemove* methods can talk to monitor if they need to.
>
> I will modify it as your suggest
>
>> To make the changes easier to follow, please do the change in two
>> patches. The first one to move qemuDomainRemoveDevice into a new thread
>> and the second one to move qemuMonitorRemoveNetdev and
>> qemuMonitorRemoveHostNetwork calls inside qemuDomainRemoveNetDevice.
>>
>> But first, wait for Laine's input, please.

Well, the level of my knowledge was that I noticed the problem caused by
the asynchronous nature of device_del (exactly the error message that
you're reporting) and reported this to QEMU, asking for an event to let
us know when it is okay to reuse a device ID (i.e. the DEVICE_DELETED
event). It appears that this isn't always good enough, though, so
*something* apparently needs to be done.

My understanding is that the problem is caused by the netdev_del being
executed too soon after device_del, and then the device ID is forever
lost due to the unclean "cleanup", is that correct? If so, then your
solution sounds correct.

But does netdev_del complete synchronously? If not, then we will also
need a completion event for that as well.




More information about the libvir-list mailing list