[libvirt] [PATCH v2 4/5] qemu_hotplug: Fix a rare race condition when detaching a device twice

Peter Krempa pkrempa at redhat.com
Thu Mar 14 14:50:26 UTC 2019


On Thu, Mar 14, 2019 at 15:31:48 +0100, Michal Privoznik wrote:
> On 3/14/19 3:14 PM, Peter Krempa wrote:
> > On Thu, Mar 14, 2019 at 14:56:48 +0100, Michal Privoznik wrote:
> > > On 3/14/19 2:18 PM, Peter Krempa wrote:
> > > > On Thu, Mar 14, 2019 at 13:22:38 +0100, Michal Privoznik wrote:
> > 
> > [...]
> > 
> > > > 
> > > > How can this be considered success? Also this introduces a possible
> > > > regression. The DEVICE_DELETED event should be fired only after the
> > > > device was entirely unplugged. Claiming success before seeing the event
> > > > can lead to another race when qemu deleted the device from the internal
> > > > list so that 'device_del' does not see it any more but did not finish
> > > > cleanup fully.
> > > > 
> > > > We need to start the '*Remove' handler only after the DEVICE_DELETED
> > > > event was received.
> > > 
> > > I beg to differ. If we were to report error here users would see the API
> > > failing with error "Device not found". So they'd run 'virsh dumpxml' only to
> > > find the device there. I don't find such behaviour sane. If one API tells me
> > > a devie is not there then another one shall not tell otherwise.
> > 
> > Well. The user semantics can be confusing here. What we can't allow
> > though is that some of the steps done in the qemuDomainRemove*Device
> > will fail because qemu will still have some internal reference to some
> > backend object.
> 
> I'm not quite sure I follow. qemuDomainRemove*Device will be run exactly
> once. Not any more times. Running it more times is a problem, but I'm
> failing to see how my patch allows that. Can you shed more light into that
> please?

What I've meant is that qemuDomainRemove*Device shall never be called
earlier than DEVICE_DELETED is received.

What I've forgot to take into account is that we will still call
qemuDomainWaitForDeviceRemoval (as your comment mentions).

This means that the scenario I was outlining will not happen as success
from this API does not automatically mean we'll try to delete the
backends of the device (which I didn't notice at first).

I'd probably just delete the mention of dumping XML. I think the
statement that qemuDomainWaitForDeviceRemoval should be clear enough.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20190314/8f112843/attachment-0001.sig>


More information about the libvir-list mailing list