[libvirt] An implementation bug in qemuMigrationWaitForCompletion that introduces an unnecessary sleep of 50 ms when a migration completes

Jiri Denemark jdenemar at redhat.com
Fri Apr 10 08:02:11 UTC 2015


On Fri, Apr 10, 2015 at 09:49:43 +0200, Michal Privoznik wrote:
> On 10.04.2015 00:32, Xing Lin wrote:
> > Hi,
> > 
> > It seems that I found a bug in the qemuMigrationWaitForCompletion(). It
> > will do a sleep of 50 ms, even when qemuMigrationUpdateJobStatus() detects
> > that a migration has completed.
> > 
> > 
> > Here is the original implementation of the while loop.
> > 
> > 
> >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> >         /* Poll every 50ms for progress & to allow cancellation */
> >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> > };
> > 
> >         if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1)
> >             break;
> > 
> >         /* cancel migration if disk I/O error is emitted while migrating */
> >         if (abort_on_error &&
> >             virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED &&
> >             pauseReason == VIR_DOMAIN_PAUSED_IOERROR) {
> >             virReportError(VIR_ERR_OPERATION_FAILED,
> >                            _("%s: %s"), job, _("failed due to I/O error"));
> >             break;
> >         }
> > 
> >         if (dconn && virConnectIsAlive(dconn) <= 0) {
> >             virReportError(VIR_ERR_OPERATION_FAILED, "%s",
> >                            _("Lost connection to destination host"));
> >             break;
> >         }
> > 
> >         virObjectUnlock(vm);
> > 
> >         nanosleep(&ts, NULL);
> > 
> >         virObjectLock(vm);
> >     }
> > 
> > 
> > Even when qemuMigrationUpdateJobStatus() detects a migration has completed,
> > the nanosleep() will be called again. When the while condition is checked,
> > then the program breaks from the while loop. This introduces an unnecessary
> > sleep of 50 ms which can be avoided by doing a sleep first. The code should
> > likely be structured as following.
> > 
> >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> >         /* Poll every 50ms for progress & to allow cancellation */
> >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> > };
> > 
> >         // do sleep here.
> >         virObjectUnlock(vm);
> > 
> >         nanosleep(&ts, NULL);
> > 
> >         virObjectLock(vm);
> 
> Yeah, I guess it's very unlikely that migration will complete within
> first 50ms after issuing the command on the monitor (in which case we
> would again wait pointlessly). The other approach would be to leave the
> sleep() where it is, but enclose it in if (jobInfo->type ==
> VIR_DOMAIN_JOB_UNBOUNDED) {}. I have no preference over these two
> versions though. So ACK to the patch of yours. However, I'll postpone
> merging the patch for a while and let others express their feelings.

Or change to loop to while (1) and add a break before the sleep if
jobInfo->type is not VIR_DOMAIN_JOB_UNBOUNDED.

Personally, I don't mind either way. The ultimate solution is to avoid
any sleeps by waiting for an event. However, Juan did not implement the
even in QEMU yet. Once he does that, I will make patches for libvirt
that will use that instead of polling every 50ms.

Jirka




More information about the libvir-list mailing list