[libvirt] [PATCH v2 2/2] qemu: avoid rare race when undefining domain

Michal Privoznik mprivozn at redhat.com
Mon Nov 3 08:54:09 UTC 2014


On 31.10.2014 15:47, Martin Kletzander wrote:
> When one domain is being undefined and at the same time started, for
> example, there is a possibility of a rare problem occuring.
>
>   - Thread 1 does virDomainUndefine(), has the lock, checks that the
>     domain is active and because it's not, calls
>     virDomainObjListRemove().
>
>   - Thread 2 does virDomainCreate() and tries to lock the domain.
>
>   - Thread 1 needs to lock domain list in order to remove the domain from
>     it, but must unlock domain first (proper order is to lock domain list
>     first and the domain itself second).
>
>   - Thread 2 grabs the lock, starts the domain and releases the lock.
>
>   - Thread 1 grabs the lock and removes the domain from list.
>
> With this patch:
>
>   - qemuDomainRemoveInactive() creates a QEMU_JOB_MODIFY if that's
>     possible, but since it must remove the domain from list either way,
>     it continues even when starting the job failed.
>
>   - Because virDomainObjListRemove() removes a reference to the
>     virDomainObj (and it must be kept that way for other drivers), we
>     hack around that by acquiring one more reference that's kept until
>     the job is ended.
>
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1150505
>
> Signed-off-by: Martin Kletzander <mkletzan at redhat.com>
> ---
>   src/qemu/qemu_domain.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
> index 76fccce..98c4763 100644
> --- a/src/qemu/qemu_domain.c
> +++ b/src/qemu/qemu_domain.c
> @@ -2392,9 +2392,13 @@ void
>   qemuDomainRemoveInactive(virQEMUDriverPtr driver,
>                            virDomainObjPtr vm)
>   {
> +    bool haveJob = true;
>       char *snapDir;
>       virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver);
>
> +    if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0)
> +        haveJob = false;
> +
>       /* Remove any snapshot metadata prior to removing the domain */
>       if (qemuDomainSnapshotDiscardAllMetadata(driver, vm) < 0) {
>           VIR_WARN("unable to remove all snapshots for domain %s",
> @@ -2409,8 +2413,13 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver,
>               VIR_WARN("unable to remove snapshot directory %s", snapDir);
>           VIR_FREE(snapDir);
>       }
> +    virObjectRef(vm);
>       virDomainObjListRemove(driver->domains, vm);
>       virObjectUnref(cfg);
> +
> +    if (haveJob)
> +        ignore_value(qemuDomainObjEndJob(driver, vm));
> +    virObjectUnref(vm);
>   }
>
>   void
>

I don't think there's a reason to do the Ref() and Unref(). Job control 
functions do that already. ACK with that removed.

Michal




More information about the libvir-list mailing list