[libvirt] [PATCH v2 7/9] rpc: Alter virNetDaemonQuit processing
Daniel P. Berrangé
berrange at redhat.com
Mon Jul 16 09:18:59 UTC 2018
On Thu, Jul 12, 2018 at 03:34:00PM -0400, John Ferlan wrote:
>
>
> On 07/09/2018 03:11 AM, Peter Krempa wrote:
> > On Sat, Jul 07, 2018 at 08:11:05 -0400, John Ferlan wrote:
> >> When virNetDaemonQuit is called from libvirtd's shutdown
> >> handler (daemonShutdownHandler) we need to perform the quit
> >> in multiple steps. The first part is to "request" the quit
> >> and notify the NetServer's of the impending quit which causes
> >> the NetServers to inform their workers that a quit was requested.
> >>
> >> Still because we cannot guarantee a quit will happen or it's
> >> possible there's no workers pending, use a virNetDaemonQuitTimer
> >> to not only break the event loop but keep track of how long we're
> >> waiting and we've waited too long, force an ungraceful exit so
> >> that we don't hang waiting forever or cause some sort of SEGV
> >> because something is still pending and we Unref things.
> >>
> >> Signed-off-by: John Ferlan <jferlan at redhat.com>
> >> ---
> >> src/libvirt_remote.syms | 1 +
> >> src/remote/remote_daemon.c | 1 +
> >> src/rpc/virnetdaemon.c | 68 +++++++++++++++++++++++++++++++++++++-
> >> src/rpc/virnetdaemon.h | 4 +++
> >> 4 files changed, 73 insertions(+), 1 deletion(-)
> >
> > [...]
> >
> >> @@ -855,10 +904,27 @@ virNetDaemonRun(virNetDaemonPtr dmn)
> >> virObjectLock(dmn);
> >>
> >> virHashForEach(dmn->servers, daemonServerProcessClients, NULL);
> >> +
> >> + /* HACK: Add a dummy timeout to break event loop */
> >> + if (dmn->quitRequested && quitTimer == -1)
> >> + quitTimer = virEventAddTimeout(500, virNetDaemonQuitTimer,
> >> + &quitCount, NULL);
> >> +
> >> + if (dmn->quitRequested && daemonServerWorkersDone(dmn)) {
> >> + dmn->quit = true;
> >> + } else {
> >> + /* Firing every 1/2 second and quitTimeout in seconds, force
> >> + * an exit when there are still worker threads running and we
> >> + * have waited long enough */
> >> + if (quitCount > dmn->quitTimeout * 2)
> >> + _exit(EXIT_FAILURE);
> >
> > If you have a legitimate long-running job which would finish eventually
> > and e.g. write an updated status XML this will break things. I'm not
> > persuaded that this is a systematic solution to some API getting stuck.
> >
> > The commit message also does not help persuading me that this is a good
> > idea.
> >
> > NACK
> >
>
> I understand the sentiment and this hasn't been a "top of the list" item
> for me to think about, but to a degree the purpose of re-posting the
> series was to try to come to a consensus over some solution. A naked
> NACK almost makes it appear as if you would prefer to not do anything in
> this situation, letting nature takes it's course (so to speak).
>
> Do you have any thoughts or suggestions related to what processing you
> believe is appropriate if someone issues a SIG{INT|QUIT|TERM} to a
> daemon (libvirtd|lockd|logd}? That is what to do if we reach the
> cleanup: label in main() of src/remote/remote_daemon.c and
> virNetDaemonClose() gets run? Right now IIRC it's either going to be a
> SEGV or possible long wait for some worker thread to complete. The
> latter is the don't apply this example to cause a wait in block stats
> fetch. Naturally the long term wait only matters up through the point
> while someone is patient before issuing a SIGKILL.
>
> Do you favor waiting forever for some worker thread to finish? To some
> degree if some signal is sent to libvirtd, then I would think the
> expectation is to not wait for some indeterminate time. And most
> definitely trying to avoid some sort of defunct state resolvable by a
> host reboot. Knowing exactly what a worker thread is doing may help, but
> that'd take a bit (;-)) of code to trace the stack. Perhaps providing a
> list of client/workers still active or even some indication that we're
> waiting on some worker rather than no feedback? How much is "too much"?
>
> Another option that was proposed, but didn't really gain any support was
> introduction of a stateShutdown in virStateDriver that would be run
> during libvirtd's cleanup: processing prior to virNetDaemonClose. That
> would be before virStateCleanup. For qemu, the priv->mon and priv->agent
> would be closed. That would seemingly generate an error and cause the
> worker to exit thus theoretically not needing any forced quit of the
> thread "if" the reason was some sort of hang as a result of
> monitor/agent processing. Doesn't mean there wouldn't be some other
> issue causing a hang. Refreshing my memory on this - there was some
> discussion or investigation related to using virStateStop, but I since
> that's gated by "WITH_DBUS" being defined, it wasn't clear if/how it
> could be used (daemonStop is only called/setup if daemonRunStateInit has
> set it up).
I think the key thing to bear in mind is what is the benefit of what we
are trying todo. ie why do we need todo a clean shutdown instead of just
immediately exiting. The kernel will clean up all resources associated
with the process when it exits. So we only need care about a clean shutdown
if there is something needing cleanup that the kernel won't handle, or if
we are trying to debug with valgrind & similar tools.
I tend to think we put way too much time into worrying about getting perfect
clean shutdown, for little real world benefit.
IMHO, we should make a moderate effort to shutdown cleanly, but if there's
something stuck after 15 seconds, we should just uncoditionally call exit().
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
More information about the libvir-list
mailing list