[libvirt] [PATCHv3] qemu: fix crash when mixing sync and async monitor jobs

Wen Congyang wency at cn.fujitsu.com
Mon Aug 1 03:00:50 UTC 2011


At 07/30/2011 05:32 AM, Eric Blake Write:
> Currently, we attempt to run sync job and async job at the same time. It
> means that the monitor commands for two jobs can be run in any order.
> 
> In the function qemuDomainObjEnterMonitorInternal():
>     if (priv->job.active == QEMU_JOB_NONE && priv->job.asyncJob) {
>         if (qemuDomainObjBeginNestedJob(driver, obj) < 0)
> We check whether the caller is an async job by priv->job.active and
> priv->job.asynJob. But when an async job is running, and a sync job is
> also running at the time of the check, then priv->job.active is not
> QEMU_JOB_NONE. So we cannot check whether the caller is an async job
> in the function qemuDomainObjEnterMonitorInternal(), and must instead
> put the burden on the caller to tell us when an async command wants
> to do a nested job.
> 
> Once the burden is on the caller, then only async monitor enters need
> to worry about whether the VM is still running; for sync monitor enter,
> the internal return is always 0, so lots of ignore_value can be dropped.
> 
> * src/qemu/THREADS.txt: Reflect new rules.
> * src/qemu/qemu_domain.h (qemuDomainObjEnterMonitorAsync): New
> prototype.
> * src/qemu/qemu_process.h (qemuProcessStartCPUs)
> (qemuProcessStopCPUs): Add parameter.
> * src/qemu/qemu_migration.h (qemuMigrationToFile): Likewise.
> (qemuMigrationWaitForCompletion): Make static.
> * src/qemu/qemu_domain.c (qemuDomainObjEnterMonitorInternal): Add
> parameter.
> (qemuDomainObjEnterMonitorAsync): New function.
> (qemuDomainObjEnterMonitor, qemuDomainObjEnterMonitorWithDriver):
> Update callers.
> * src/qemu/qemu_driver.c (qemuDomainSaveInternal)
> (qemudDomainCoreDump, doCoreDump, processWatchdogEvent)
> (qemudDomainSuspend, qemudDomainResume, qemuDomainSaveImageStartVM)
> (qemuDomainSnapshotCreateActive, qemuDomainRevertToSnapshot):
> Likewise.
> * src/qemu/qemu_process.c (qemuProcessStopCPUs)
> (qemuProcessFakeReboot, qemuProcessRecoverMigration)
> (qemuProcessRecoverJob, qemuProcessStart): Likewise.
> * src/qemu/qemu_migration.c (qemuMigrationToFile)
> (qemuMigrationWaitForCompletion, qemuMigrationUpdateJobStatus)
> (qemuMigrationJobStart, qemuDomainMigrateGraphicsRelocate)
> (doNativeMigrate, doTunnelMigrate, qemuMigrationPerformJob)
> (qemuMigrationPerformPhase, qemuMigrationFinish)
> (qemuMigrationConfirm): Likewise.
> * src/qemu/qemu_hotplug.c: Drop unneeded ignore_value.
> ---
> 
> v3: incorporate Wen's feedback - in particular, virProcessStartCPUs
> now checks for return type, restarting libvirt does not use an
> async job, and I didn't hit the deadlock in the same scenarios as
> I tested under v2.

I diff this patch with v2, and all comments have been addressed.
And the deadlock also has been fixed.

> I still need to do migration testing before I'm convinced that this
> is right, but it's doing a lot better.

If migration testing pass, ACK.




More information about the libvir-list mailing list