[libvirt] [RFC PATCH 00/10] Resolve libvirtd hang on termination with connected long running client

John Ferlan jferlan at redhat.com
Wed Jan 10 17:23:25 UTC 2018


This RFC is a combination of a couple of different patch postings that
I've combined into one central "stream" of patches that can be discussed
for their relative importance or need to fix the problem.

Although a bit long winded, I think I've captured enough history for
anyone so inclined to walk through the history to understand the maze
of twisty patches that it takes to hopefully resolve the issue.

The first two patches were presented previously, but not accepted:

  https://www.redhat.com/archives/libvir-list/2017-November/msg00296.html
  https://www.redhat.com/archives/libvir-list/2017-November/msg00297.html

However since that time, it seems "some form" of the patches is necessary.
Most importantly making sure the virObjectUnref for @srv and @srvAdm occurs
*prior to* the virNetDaemonClose(dmn); at cleanup (IOW: out of order for a
reason). Doing that also requires any program started on the servers also
has the virObjectUnref prior to daemon close.

The 3rd and 4th patches are a result of discussions held in mid
December related to libvirtd crashes/hangs and some possible adjustments
to help. Discussion starts here:

  https://www.redhat.com/archives/libvir-list/2017-December/msg00515.html

This led to suggestions to move the toggling of services from Dispose
to Close *and* to split the virThreadPoolFree into a Drain function
that could also be called during the Close function rather than waiting
for the Dispose to occur.

Still testing showed that just those 4 patches it still wasn't enough
as libvirtd ended up just "hung" because of some patches Nikolay posted
that add a new shutdown state, see:

  https://www.redhat.com/archives/libvir-list/2017-October/msg01134.html

Those patches languished mainly because it wasn't clear (at the time)
the relationship between them and another series dealing with libvirtd
crashes that was partially accepted and pushed:

  https://www.redhat.com/archives/libvir-list/2017-October/msg01347.html

and followup discussion starting here:

  https://www.redhat.com/archives/libvir-list/2017-November/msg00023.html

The 9th patch can be used to test that the first 8 do the job. The
details on how I set up the test environment is in the patch.  If the
sequence is run before the first 8 patches, you will end up with a
couple of different hang scenarios. So if you're compelled to see
what the big deal is, then apply this one alone and have fun playing.

The 10th patch is the one patch from the partially pushed series that
wasn't pushed as it was not deemed necessary. It's presented here mainly
for completeness.

John Ferlan (5):
  libvirtd: Alter refcnt processing for domain server objects
  libvirtd: Alter refcnt processing for server program objects
  netserver: Toggle service off during close
  qemu: Introduce virTheadPoolDrain
  APPLY ONLY FOR TESTING PURPOSES

Nikolay Shirokovskiy (5):
  libvirt: introduce hypervisor driver shutdown function
  qemu: implement state driver shutdown function
  qemu: agent: fix monitor close during first sync
  qemu: monitor: check monitor not closed upon send
  libvirtd: fix crash on termination

 daemon/libvirtd.c             | 46 ++++++++++++++++++++++++++++++++-----------
 src/driver-state.h            |  4 ++++
 src/libvirt.c                 | 18 +++++++++++++++++
 src/libvirt_internal.h        |  1 +
 src/libvirt_private.syms      |  2 ++
 src/qemu/qemu_agent.c         | 14 ++++++-------
 src/qemu/qemu_driver.c        | 44 +++++++++++++++++++++++++++++++++++++++++
 src/qemu/qemu_monitor.c       | 27 ++++++++++++-------------
 src/rpc/virnetdaemon.c        |  1 +
 src/rpc/virnetserver.c        |  5 ++---
 src/rpc/virnetserverservice.c |  2 ++
 src/util/virthreadpool.c      | 19 ++++++++++++------
 src/util/virthreadpool.h      |  2 ++
 13 files changed, 143 insertions(+), 42 deletions(-)

-- 
2.13.6




More information about the libvir-list mailing list