[PATCH 00/10] resolve hangs/crashes on libvirtd shutdown

Daniel P. Berrangé berrange at redhat.com
Wed Jul 15 14:14:37 UTC 2020

On Wed, Jul 15, 2020 at 08:51:03AM +0300, Nikolay Shirokovskiy wrote:
> On 14.07.2020 17:53, Daniel Henrique Barboza wrote:
> > As far as code goes:
> > 
> > 
> > Reviewed-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
> > 
> > 
> > About the design I have a question about the timeout. Patch 5/10 is setting a
> > 15 second timeout. How did you reach this value? Reading the bug, specially
> > this comment from Daniel:
> > 
> > https://bugzilla.redhat.com/show_bug.cgi?id=1828207#c6
> > 
> > He mentions "give it 5 seconds of running before shutting it down".
> I guess 5 seconds is time for libvirtd to finish startup. This time has
> different nature than time for libvirtd to finish it's work on shutdown
> so it can be different.
> > 
> > 5 seconds before shutdown is something that most users can be slightly annoyed
> > but in the end don't mind that much, but 15 seconds is something that will
> > cause bugs to be opened because "Libvirt is taking too long to shutdown".
> > Besides, it's a fair assumption that a transaction that takes more than
> > 5 or so seconds to finish is already compromised* - might as well shutdown
> > the daemon and deal with the errors.
> 15 seconds was mentioned by Daniel in [1] when he first proposed the approach
> so I used this value without any extra thought. However I missed that in
> the last John's series [2] the default for waiting time is 0s. May be this
> is the current decision on waiting time. Let's wait for others to join
> the review.

Don't read too much into the precise numbers I mentioned, they would just
be plucked out of the air :-)

If there is some job taking place wrt a VM that is taking a long time to
complete and thus blocking shutdown, I think it is important to give it a
fair opportunity to finish gracefully.  systemd itself gives services
something like 90 seconds to exit before it gives up on them.

On a heavily loaded host, 5 seconds is almost certainly too short. 15
seconds is not bad, but I wouldn't object to 30 seconds either, as long
as we're emitting some log message warning that we're delayed.

In the "normal" case these timeouts won't be hit, so we're only delayed
in the scenarios where we're likely to be doing something important for
a VM.

|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

More information about the libvir-list mailing list