[libvirt] An implementation bug in qemuMigrationWaitForCompletion that introduces an unnecessary sleep of 50 ms when a migration completes

Xing Lin linxingnku at gmail.com
Fri Apr 10 08:09:33 UTC 2015


I was thinking since a migration won't finish in 50 ms, we are going to do
at least one sleep anyway. So, I just moved the sleep around.

I agree that the ideal approach is to wait for migration completion event
notification from Qemu, rather than polling every 50 ms.

-Xing

On Fri, Apr 10, 2015 at 2:02 AM, Jiri Denemark <jdenemar at redhat.com> wrote:

> On Fri, Apr 10, 2015 at 09:49:43 +0200, Michal Privoznik wrote:
> > On 10.04.2015 00:32, Xing Lin wrote:
> > > Hi,
> > >
> > > It seems that I found a bug in the qemuMigrationWaitForCompletion(). It
> > > will do a sleep of 50 ms, even when qemuMigrationUpdateJobStatus()
> detects
> > > that a migration has completed.
> > >
> > >
> > > Here is the original implementation of the while loop.
> > >
> > >
> > >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> > >         /* Poll every 50ms for progress & to allow cancellation */
> > >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 *
> 1000ull
> > > };
> > >
> > >         if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) ==
> -1)
> > >             break;
> > >
> > >         /* cancel migration if disk I/O error is emitted while
> migrating */
> > >         if (abort_on_error &&
> > >             virDomainObjGetState(vm, &pauseReason) ==
> VIR_DOMAIN_PAUSED &&
> > >             pauseReason == VIR_DOMAIN_PAUSED_IOERROR) {
> > >             virReportError(VIR_ERR_OPERATION_FAILED,
> > >                            _("%s: %s"), job, _("failed due to I/O
> error"));
> > >             break;
> > >         }
> > >
> > >         if (dconn && virConnectIsAlive(dconn) <= 0) {
> > >             virReportError(VIR_ERR_OPERATION_FAILED, "%s",
> > >                            _("Lost connection to destination host"));
> > >             break;
> > >         }
> > >
> > >         virObjectUnlock(vm);
> > >
> > >         nanosleep(&ts, NULL);
> > >
> > >         virObjectLock(vm);
> > >     }
> > >
> > >
> > > Even when qemuMigrationUpdateJobStatus() detects a migration has
> completed,
> > > the nanosleep() will be called again. When the while condition is
> checked,
> > > then the program breaks from the while loop. This introduces an
> unnecessary
> > > sleep of 50 ms which can be avoided by doing a sleep first. The code
> should
> > > likely be structured as following.
> > >
> > >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> > >         /* Poll every 50ms for progress & to allow cancellation */
> > >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 *
> 1000ull
> > > };
> > >
> > >         // do sleep here.
> > >         virObjectUnlock(vm);
> > >
> > >         nanosleep(&ts, NULL);
> > >
> > >         virObjectLock(vm);
> >
> > Yeah, I guess it's very unlikely that migration will complete within
> > first 50ms after issuing the command on the monitor (in which case we
> > would again wait pointlessly). The other approach would be to leave the
> > sleep() where it is, but enclose it in if (jobInfo->type ==
> > VIR_DOMAIN_JOB_UNBOUNDED) {}. I have no preference over these two
> > versions though. So ACK to the patch of yours. However, I'll postpone
> > merging the patch for a while and let others express their feelings.
>
> Or change to loop to while (1) and add a break before the sleep if
> jobInfo->type is not VIR_DOMAIN_JOB_UNBOUNDED.
>
> Personally, I don't mind either way. The ultimate solution is to avoid
> any sleeps by waiting for an event. However, Juan did not implement the
> even in QEMU yet. Once he does that, I will make patches for libvirt
> that will use that instead of polling every 50ms.
>
> Jirka
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20150410/eef72f05/attachment-0001.htm>


More information about the libvir-list mailing list