[libvirt] [PATCH v3 1/2] process: wait longer on kill per assigned Hostdev
Christian Ehrhardt
christian.ehrhardt at canonical.com
Tue Aug 21 12:20:21 UTC 2018
On Tue, Aug 21, 2018 at 1:15 PM Daniel P. Berrangé <berrange at redhat.com>
wrote:
> On Tue, Aug 14, 2018 at 11:27:33AM +0200, Christian Ehrhardt wrote:
> > It was found that in cases with host devices virProcessKillPainfully
> > might be able to send signal zero to the target PID for quite a while
> > with the process already being gone from /proc/<PID>.
> >
> > That is due to cleanup and reset of devices which might include a
> > secondary bus reset that on top of the actions taken has a 1s delay
> > to let the bus settle. Due to that guests with plenty of Host devices
> > could easily exceed the default timeouts.
> >
> > To solve that, this adds an extra delay of 2s per hostdev that is
> associated
> > to a VM.
> >
> > Signed-off-by: Christian Ehrhardt <christian.ehrhardt at canonical.com>
> > ---
> > src/libvirt_private.syms | 1 +
> > src/qemu/qemu_process.c | 5 +++--
> > src/util/virprocess.c | 18 +++++++++++++++---
> > src/util/virprocess.h | 1 +
> > 4 files changed, 20 insertions(+), 5 deletions(-)
> >
> > diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
> > index ca4a192a4a..47ea35f864 100644
> > --- a/src/libvirt_private.syms
> > +++ b/src/libvirt_private.syms
> > @@ -2605,6 +2605,7 @@ virProcessGetPids;
> > virProcessGetStartTime;
> > virProcessKill;
> > virProcessKillPainfully;
> > +virProcessKillPainfullyDelay;
> > virProcessNamespaceAvailable;
> > virProcessRunInMountNamespace;
> > virProcessSchedPolicyTypeFromString;
> > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> > index 02fdc55156..b7bf8813da 100644
> > --- a/src/qemu/qemu_process.c
> > +++ b/src/qemu/qemu_process.c
> > @@ -6814,8 +6814,9 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int
> flags)
> > return 0;
> > }
> >
> > - ret = virProcessKillPainfully(vm->pid,
> > - !!(flags &
> VIR_QEMU_PROCESS_KILL_FORCE));
> > + ret = virProcessKillPainfullyDelay(vm->pid,
> > + !!(flags &
> VIR_QEMU_PROCESS_KILL_FORCE),
> > + vm->def->nhostdevs * 2);
>
> This API contract is a bit wierd. You've got an arbitray x2 multiplier
> here...
>
Out of the discussion with Alex I derived that due to the 1 sec settling of
the bus plus some extra work 2 seconds per device would be a good rule of
thumb.
The delay is meant to be in seconds and here it requests 2*nhostdevs to get
the amount needed.
And I'd not want to make the "unit" of the call "double-seconds" :-)
Let me add a comment at this call to explain the reasoning on the
multiplier here.
> diff --git a/src/util/virprocess.c b/src/util/virprocess.c
> > index ecea27a2d4..46360cc051 100644
> > --- a/src/util/virprocess.c
> > +++ b/src/util/virprocess.c
> > @@ -341,15 +341,19 @@ int virProcessKill(pid_t pid, int sig)
> > * Returns 0 if it was killed gracefully, 1 if it
> > * was killed forcibly, -1 if it is still alive,
> > * or another error occurred.
> > + *
> > + * Callers can proide an extra delay to wait longer
> > + * than the default.
>
> Mention that it is in "seconds"
>
Ack
> */
> > int
> > -virProcessKillPainfully(pid_t pid, bool force)
> > +virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int
> extradelay)
> > {
> > size_t i;
> > int ret = -1;
> > + unsigned int delay = 75 + (extradelay*5);
>
> ...and it gets another x5 multiplier here.
>
That *5 is due to the internal unit it polls being 0.2 seconds.
I think externally anything other than seconds (double-seconds,
half-deca-seconds, ...) makes no sense.
I'll call it "polldelay" instead to make it more clear that.
That uncommon *5 ratio was there before, just not as redable (it just
defined 75 in the loop header).
I'll add some text about it as well to be sure.
> Feels nicer to just have the caller provide a x10 multiplier and
> honour that directly
>
See above, the unit being seconds seems the most readable IMHO.
With the changes made I think it won't be as confusing.
> const char *signame = "TERM";
> >
> > - VIR_DEBUG("vpid=%lld force=%d", (long long)pid, force);
> > + VIR_DEBUG("vpid=%lld force=%d delay=%u", (long long)pid, force,
> pid);
>
> s/pid/extradelay/
>
Yes, that was mentioned by Bjoern and queued that way.
Also in the text the delay= should be extradelay= which is ready that way.
>
> > /* This loop sends SIGTERM, then waits a few iterations (10 seconds)
> > * to see if it dies. If the process still hasn't exited, and
> > @@ -357,9 +361,12 @@ virProcessKillPainfully(pid_t pid, bool force)
> > * wait up to 5 seconds more for the process to exit before
> > * returning.
> > *
> > + * An extra delay can be specified for cases that are expected to
> clean
> > + * up slower than usual.
> > + *
> > * Note that setting @force could result in dataloss for the
> process.
> > */
> > - for (i = 0; i < 75; i++) {
> > + for (i = 0; i < delay; i++) {
> > int signum;
> > if (i == 0) {
> > signum = SIGTERM; /* kindly suggest it should exit */
> > @@ -402,6 +409,11 @@ virProcessKillPainfully(pid_t pid, bool force)
> > }
> >
> >
> > +int virProcessKillPainfully(pid_t pid, bool force)
> > +{
> > + return virProcessKillPainfullyDelay(pid, force, 0);
> > +}
> > +
> > #if HAVE_SCHED_GETAFFINITY
> >
> > int virProcessSetAffinity(pid_t pid, virBitmapPtr map)
> > diff --git a/src/util/virprocess.h b/src/util/virprocess.h
> > index 3c5a882772..b72603ca8e 100644
> > --- a/src/util/virprocess.h
> > +++ b/src/util/virprocess.h
> > @@ -55,6 +55,7 @@ virProcessWait(pid_t pid, int *exitstatus, bool raw)
> > int virProcessKill(pid_t pid, int sig);
> >
> > int virProcessKillPainfully(pid_t pid, bool force);
> > +int virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int);
>
> Missing parameter name here
>
Ack, and then breaking it into new lines for 80 chars
Thanks all these will be part of a v4 ready after some syntax check and
test builds ...
Regards,
> Daniel
> --
> |: https://berrange.com -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o-
> https://www.instagram.com/dberrange :|
>
--
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20180821/1d45df72/attachment-0001.htm>
More information about the libvir-list
mailing list