[libvirt] [PATCH] qemu: call drive_unplug in DetachPciDiskDevice

Daniel P. Berrange berrange at redhat.com
Thu Oct 21 17:53:54 UTC 2010


On Thu, Oct 21, 2010 at 12:43:14PM -0500, Ryan Harper wrote:
> * Daniel P. Berrange <berrange at redhat.com> [2010-10-21 12:10]:
> > On Thu, Oct 21, 2010 at 11:53:29AM -0500, Ryan Harper wrote:
> > > * Daniel P. Berrange <berrange at redhat.com> [2010-10-21 11:46]:
> > > > On Thu, Oct 21, 2010 at 08:50:35AM -0500, Ryan Harper wrote:
> > > > > Currently libvirt doesn't confirm whether the guest has responded to the
> > > > > disk removal request.  In some cases this can leave the guest with
> > > > > continued access to the device while the mgmt layer believes that it has
> > > > > been removed.  With a recent qemu monitor command[1] we can
> > > > > deterministically revoke a guests access to the disk (on the QEMU side)
> > > > > to ensure no futher access is permitted.
> > > > > 
> > > > > This patch adds support for the drive_unplug() command and introduces it
> > > > > in the disk removal paths.  There is some discussion to be had about how
> > > > > to handle the case where the guest is running in a QEMU without this
> > > > > command (and the fact that we currently don't have a way of detecting
> > > > > what monitor commands are available).
> > > > 
> > > > Basically we try to run the command and then catch the failure.
> > > > 
> > > > For QMP, we can check for a error with a class of 'CommandNotFound',
> > > > 
> > > > For HMP, QEMU will print 'unknown command' in the reply.
> > > > 
> > > > Neither is ideal, since neither is a guarenteed part of the monitor
> > > > interface, but it is all we have to go on, and ensure other critical
> > > > errors will still be treated as fatal by libvirt.
> > > 
> > > Yeah, this sorta already works without explictly catching the error
> > > (unless you want to display a different message at caller level than
> > > command not found).
> > 
> > This would cause the hot-unplug operation to completely terminate
> > though, whereas we want it to carry on, if the command is not
> > present. If we get a command-not-found, then we should syslog a 
> > warning, and then return '0' (ie success) to the caller of
> > qemuMonitorDriveUnplug
> > 
> > Alternatively, return '+1' to the caller, and make the callers
> > check for '+1', syslog it and treat it as non-fatal. This is
> > closer to the approach we use for handling 'balloon device not
> > present' error when code runs the 'info balloon' method.
> > 
> > > > > My current implementation assumes that if you don't have a QEMU with
> > > > > this capability that we should fail the device removal.  This is a
> > > > > strong statement around hotplug that isn't consistent with previous
> > > > > releases so I'm open to other approachs, but given the potential data
> > > > > leakage problem hot-remove can lead to without drive_unplug, I think
> > > > > it's the right thing to do.
> > > > 
> > > > I don't think we can do this, since it obviously breaks every single
> > > > existing deployment out there. Users who have sVirt enabled will
> > > > have a level of protection from the data leakage, so I don't think
> > > > it is a severe problem. 
> > > 
> > > Yeah, I was thinking we could re-work the logic to only completely fail
> > > in the case where the command wasn't found/successful *and* we don't
> > > have sVirt enabled.
> > 
> > I'm still wary of doing that because of the significant negative
> > impact on existing deployments out there, even in the case where
> > hotunplug would complete just fine. I think we'll just need to
> > syslog the potential issue and people can upgrade their QMEU if
> > they need the extra security
> 
> Thinking on this again; if we don't have a confirmation that the guest
> has removed the pci device (independent of whether we've revoked access
> to the underlying block device); then I really think we should fail the
> device removal; at the very least we cannot update libvirt state to
> indicate that the slot is free for re-assignment.

The trouble with this is that we'd be breaking 100% of hotunplug cases
for old QEMU, even though only 1% of cases actually exhibit any problem
in the real world. Accidentally trying to re-use a slot that isn't free
is a fairly minor issue. I agree we need to deal with this properly long
term, but I don't want to gratuitously break existing deployments yet.
Once we have a plan for supporting this properly across libvirt+qemu,
then we can re-evaluate whether we should be stricter for old qemu. I
think we should focus the first patches on the task of supporting the
new drive unplug command which has clear immediate benefit for everyone,
and consider the slot re-use problem in future dev work.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London    -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org        -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list