[libvirt] [PATCH] migration: Usable time statistics without requiring NTP

Jiri Denemark jdenemar at redhat.com
Thu Apr 23 13:25:31 UTC 2015


On Thu, Apr 23, 2015 at 11:40:11 +0200, Michal Privoznik wrote:
> On 23.04.2015 11:18, Jiri Denemark wrote:
> > virDomainGetJobStats is able to report statistics of a completed
> > migration, however to get usable downtime and total time statistics both
> > hosts have to keep synchronized time. To provide at least some
> > estimation of the times even when NTP daemons are not running on both
> > hosts we can just ignore the time needed to transfer a migration cookie
> > to the destination host. The result will be also inaccurate but a bit
> > more predictable. The total/down time will just be at least what we
> > report.
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1213434
> > Signed-off-by: Jiri Denemark <jdenemar at redhat.com>
> > ---
> >  include/libvirt/libvirt-domain.h | 23 ++++++++++++++++++++++-
> >  src/qemu/qemu_domain.c           | 15 +++++++++++++++
> >  src/qemu/qemu_domain.h           |  9 +++++++++
> >  src/qemu/qemu_migration.c        | 26 +++++++++++++-------------
> >  tools/virsh-domain.c             | 16 ++++++++++++++++
> >  5 files changed, 75 insertions(+), 14 deletions(-)
> > 
> 
> > diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
> > index 1da687c..4b3143f 100644
> > --- a/src/qemu/qemu_migration.c
> > +++ b/src/qemu/qemu_migration.c
> 
> > @@ -3438,18 +3443,9 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver,
> >      /* Update total times with the values sent by the destination daemon */
> >      if (mig->jobInfo) {
> >          qemuDomainObjPrivatePtr priv = vm->privateData;
> > -        if (priv->job.completed) {
> > -            qemuDomainJobInfoPtr jobInfo = priv->job.completed;
> > -            if (mig->jobInfo->status.downtime_set) {
> > -                jobInfo->status.downtime = mig->jobInfo->status.downtime;
> > -                jobInfo->status.downtime_set = true;
> > -            }
> > -            if (mig->jobInfo->timeElapsed)
> > -                jobInfo->timeElapsed = mig->jobInfo->timeElapsed;
> > -        } else {
> > -            priv->job.completed = mig->jobInfo;
> > -            mig->jobInfo = NULL;
> > -        }
> > +        VIR_FREE(priv->job.completed);
> > +        priv->job.completed = mig->jobInfo;
> > +        mig->jobInfo = NULL;
> >      }
> >  
> >      if (flags & VIR_MIGRATE_OFFLINE)
> > @@ -4041,6 +4037,7 @@ qemuMigrationRun(virQEMUDriverPtr driver,
> >      if (priv->job.completed) {
> >          qemuDomainJobInfoUpdateTime(priv->job.completed);
> >          qemuDomainJobInfoUpdateDowntime(priv->job.completed);
> > +        ignore_value(virTimeMillisNow(&priv->job.completed->sent));
> 
> So here you mark the time of start of the migration (on the source)...

This is actually the end of migration, i.e., just be for we sent the
cookie to the destination.

> 
> >      }
> >  
> >      if (priv->job.current->type == VIR_DOMAIN_JOB_UNBOUNDED)
> > @@ -5164,8 +5161,11 @@ qemuMigrationFinish(virQEMUDriverPtr driver,
> >          }
> >  
> >          if (mig->jobInfo) {
> > -            priv->job.completed = mig->jobInfo;
> > +            qemuDomainJobInfoPtr jobInfo = mig->jobInfo;
> > +            priv->job.completed = jobInfo;
> >              mig->jobInfo = NULL;
> > +            if (jobInfo->sent && virTimeMillisNow(&jobInfo->received) == 0)
> > +                jobInfo->timeDelta = jobInfo->received - jobInfo->sent;
> 
> ... and here, once the migration is finished, you compute the time
> difference.

And here, when we get the cookie, we compute a difference between now
and the time stored in "sent". Which means timeDelta will contain the
difference between now on the source and now on the destination plus any
time required to transfer and process the cookie.

> What I am worried about is, what if time on both machines is
> so off that this value makes no sense (e.g. timeDelta would be a
> negative number)?

timeDelta is signed and can of course be negative (if the destination is
behind the source) and there's no problem with this. The only problem
would be when the difference is so large it would overflow, but this
means the difference would have to be > 2^63, which is something like
292 million years, which cannot even be represented by time_t. I don't
think we need to worry about this.

> Moreover, don't we have it as a migration prerequisite that time on the
> both machines needs to be synchronized?

No, there's no such requirement.

Jirka




More information about the libvir-list mailing list