[libvirt] [PATCH] Fix race condition when destroying guests
Richard W.M. Jones
rjones at redhat.com
Fri Jan 18 15:42:46 UTC 2013
On Fri, Jan 18, 2013 at 02:39:11PM +0000, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange at redhat.com>
>
> When running virDomainDestroy, we need to make sure that no other
> background thread cleans up the domain while we're doing our work.
> This can happen if we release the domain object while in the
> middle of work, because the monitor might detect EOF in this window.
> For this reason we have a 'beingDestroyed' flag to stop the monitor
> from doing its normal cleanup. Unfortunately this flag was only
> being used to protect qemuDomainBeginJob, and not qemuProcessKill
>
> This left open a race condition where either libvirtd could crash,
> or alternatively report bogus error messages about the domain already
> having been destroyed to the caller
>
> Signed-off-by: Daniel P. Berrange <berrange at redhat.com>
> ---
> src/qemu/qemu_driver.c | 19 ++++++++++++-------
> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index c28c223..2a6f381 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -2121,24 +2121,29 @@ qemuDomainDestroyFlags(virDomainPtr dom,
>
> qemuDomainSetFakeReboot(driver, vm, false);
>
> +
> + /* We need to prevent monitor EOF callback from doing our work (and sending
> + * misleading events) while the vm is unlocked inside BeginJob/ProcessKill API
> + */
> + priv->beingDestroyed = true;
> +
> /* Although qemuProcessStop does this already, there may
> * be an outstanding job active. We want to make sure we
> * can kill the process even if a job is active. Killing
> * it now means the job will be released
> */
> if (flags & VIR_DOMAIN_DESTROY_GRACEFUL) {
> - if (qemuProcessKill(driver, vm, 0) < 0)
> + if (qemuProcessKill(driver, vm, 0) < 0) {
> + priv->beingDestroyed = false;
> goto cleanup;
> + }
> } else {
> - if (qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0)
> + if (qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0) {
> + priv->beingDestroyed = false;
> goto cleanup;
> + }
> }
>
> - /* We need to prevent monitor EOF callback from doing our work (and sending
> - * misleading events) while the vm is unlocked inside BeginJob API
> - */
> - priv->beingDestroyed = true;
> -
> if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_DESTROY) < 0)
> goto cleanup;
>
> --
> 1.8.0.2
This is a really difficult bug to reproduce using libguestfs. I've
only reproduced it a handful of times in about 3 months.
I would say, push it, based on:
(1) Rationale and patch looks sane.
(2) It fixes your minimal test case (which is a strong enough reason in
itself).
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/
More information about the libvir-list
mailing list