[libvirt] libvirtd deadlock on shutdown
Hu Tao
hutao at cn.fujitsu.com
Fri Jun 22 03:38:14 UTC 2012
Cced Danp.
I'm sorry for the unconvenience, but my mail server occasionally cuts my
CC list for unknown reason.
On Fri, Jun 22, 2012 at 11:26:03AM +0800, Hu Tao wrote:
> On Thu, Jun 21, 2012 at 09:56:54AM -0600, Jim Fehlig wrote:
> > Daniel P. Berrange wrote:
> > > On Wed, Jun 20, 2012 at 03:07:16PM -0600, Jim Fehlig wrote:
> > >
> > >> I'm looking into a libvirtd deadlock on daemon shutdown. The deadlock
> > >> occurs when shutting down virNetServer. If the handling of a job is in
> > >> flight in virNetServerHandleJob(), the virNetServer lock is acquired
> > >> when freeing job->prog (src/rpc/virnetserver.c:167). But the lock is
> > >> already held in virNetServerFree(), which is blocked in
> > >> virThreadPoolFree() waiting for all the workers to finish. No progress
> > >> can be made.
> > >>
> > >> The attached hack fixes the problem, but I'm not convinced this is an
> > >> appropriate fix. Is it necessary to hold the virNetServer lock when
> > >> calling virNetServerProgramFree(job->prog)? I notice the lock is not
> > >> held in the error path of virNetServerHandleJob().
> > >>
> > >> Thanks,
> > >> Jim
> > >>
> > >>
> > >
> > >
> > >> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
> > >> index ae19e84..edd3196 100644
> > >> --- a/src/rpc/virnetserver.c
> > >> +++ b/src/rpc/virnetserver.c
> > >> @@ -774,7 +774,9 @@ void virNetServerFree(virNetServerPtr srv)
> > >> for (i = 0 ; i < srv->nservices ; i++)
> > >> virNetServerServiceToggle(srv->services[i], false);
> > >>
> > >> + virNetServerUnlock(srv);
> > >> virThreadPoolFree(srv->workers);
> > >> + virNetServerLock(srv);
> > >>
> > >> for (i = 0 ; i < srv->nsignals ; i++) {
> > >> sigaction(srv->signals[i]->signum, &srv->signals[i]->oldaction, NULL);
> > >>
> > >
> > > The virNetServerPtr object is ref-counted, so technical;ly
> > > only decrementing the ref count needs to be protected. Once
> > > the ref count hits 0, then only the current thread (and the
> > > workers) should be using the virNetServerPtr instance.
> > >
> >
> > Ah, right. Thanks for clarifying.
> >
> > > So, yes, it is safe to call virNetServerUnlock before
> > > virThreadPoolFree. Furthermore, it is /not/ required
> > > to call virNetServerLock afterwards - no other thread
> > > should be using it now the workers are dead. So you
> > > can skip that extra lock call, and also remove the
> > > unlock call much later
> > >
> >
> > Something like the attached patch?
> >
> > Regards,
> > Jim
> >
>
> > >From 583be33213e922899b23f036494886397b2549dc Mon Sep 17 00:00:00 2001
> > From: Jim Fehlig <jfehlig at suse.com>
> > Date: Thu, 21 Jun 2012 09:21:44 -0600
> > Subject: [PATCH] Fix deadlock on libvirtd shutdown
> >
> > When shutting down libvirtd, the virNetServer shutdown can deadlock
> > if there are in-flight jobs being handled by virNetServerHandleJob().
> > virNetServerFree() will acquire the virNetServer lock and call
> > virThreadPoolFree() to terminate the workers, waiting for the workers
> > to finish. But in-flight workers will attempt to acquire the
> > virNetServer lock, resulting in deadlock.
> >
> > Fix the deadlock by unlocking the virNetServer lock before calling
> > virThreadPoolFree(). This is safe since the virNetServerPtr object
> > is ref-counted and only decrementing the ref count needs to be
> > protected. Additionally, there is no need to re-acquire the lock
> > after virThreadPoolFree() completes as all the workers have
> > terminated.
> > ---
> > src/rpc/virnetserver.c | 6 ++----
> > 1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
> > index ae19e84..9d71e53 100644
> > --- a/src/rpc/virnetserver.c
> > +++ b/src/rpc/virnetserver.c
> > @@ -766,10 +766,9 @@ void virNetServerFree(virNetServerPtr srv)
> > virNetServerLock(srv);
> > VIR_DEBUG("srv=%p refs=%d", srv, srv->refs);
> > srv->refs--;
> > - if (srv->refs > 0) {
> > - virNetServerUnlock(srv);
> > + virNetServerUnlock(srv);
>
> At this point other threads may have changed srv->refs...
>
> > + if (srv->refs > 0)
>
> ...so it's unsafe to test srv->refs here without locking.
>
> For example, assume srv->refs is 2 at the beginning of virNetServerFree,
>
> thread A thread B
>
> lock srv lock srv (blocked)
>
> dec srv->refs
>
> (srv->refs is 1)
>
> unlock srv
>
> lock srv
>
> dec srv->refs
>
> (srv->refs is 0)
>
> unlock src
>
> test srv->refs test srv->refs
>
>
> In this case both threads have found srv->refs is 0 and are going to
> free srv...
>
> Following patch fixes problem.
>
> >From 25aa97e05aa76054b781a4e5e83781ee16d5afee Mon Sep 17 00:00:00 2001
> From: Hu Tao <hutao at cn.fujitsu.com>
> Date: Fri, 22 Jun 2012 11:15:01 +0800
> Subject: [PATCH] fix a bug of ref count in virnetserver.c
>
> The test of ref count is not protected by lock, which is unsafe because
> the ref count may have been changed by other threads during the test.
>
> This patch fixes this.
> ---
> src/rpc/virnetserver.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/rpc/virnetserver.c b/src/rpc/virnetserver.c
> index 9d71e53..247ddd7 100644
> --- a/src/rpc/virnetserver.c
> +++ b/src/rpc/virnetserver.c
> @@ -759,15 +759,16 @@ void virNetServerQuit(virNetServerPtr srv)
> void virNetServerFree(virNetServerPtr srv)
> {
> int i;
> + int refs;
>
> if (!srv)
> return;
>
> virNetServerLock(srv);
> VIR_DEBUG("srv=%p refs=%d", srv, srv->refs);
> - srv->refs--;
> + refs = --srv->refs;
> virNetServerUnlock(srv);
> - if (srv->refs > 0)
> + if (refs > 0)
> return;
>
> for (i = 0 ; i < srv->nservices ; i++)
> --
> 1.7.4.4
>
>
> --
> Thanks,
> Hu Tao
>
> --
> libvir-list mailing list
> libvir-list at redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
--
Thanks,
Hu Tao
More information about the libvir-list
mailing list