[libvirt] deadlock in remoteDispatchDomainUndefine vs daemonStreamHandleAbort

Michal Privoznik mprivozn at redhat.com
Mon Apr 1 14:35:15 UTC 2019


On 4/1/19 4:25 PM, Christian Ehrhardt wrote:
> Hi,
> I happened to analyze a bug [1] report I got from a friend and for
> quite a while it was rather elusive. But I now finally got it
> reproducible [2] enough to share it with the community.
> 
> The TL;DR of what I see is:
> - an automation with python-libvirt gets a SIGINT
> - cleanup runs destroy and further undefine
> - the guest closes FDs due to SIGINT and/or destroy which triggers
> daemonStreamHandleAbort
> - those two fight over the lock
> 
> There I get libvirtd into a deadlock which ends up with all threads
> dead [4] and two of them fighting [3] (details) in particular.
> 
> The to related stacks summarized are like:
> 
> daemonStreamHandleWrite (failing to write)
>   -> daemonStreamHandleAbort (closing things and cleaning up)
>      -> ... virChrdevFDStreamCloseCb
>          virMutexLock(&priv->devs->lock);
> 
> # there is code meant to avoid such issues emitting "Unable to close"
> if a lock is held
> # but the log doesn't show this triggering with debug enabled
> 
> #10 seems triggered via an "undefine" call
>    remoteDispatchDomainUndefine
>    ... -> virChrdevFree
>       ... -> virFDStreamSetInternalCloseCb
>          -> virObjectLock(virFDStreamDataPtr fdst)
>            -> virMutexLock(&obj->lock);
>    # closing all streams of a guest (requiring the same locks)
> 
> While that already feels quite close I struggle to see where exactly
> we'd want to fix it.
> But finally having a repro-script [2] I hope that someone else here
> might be able to help me with that.
> 
> After all it is a race - on my s390x system it triggers usually <5
> tries, while on x86 I have needed up to 18 runs of the test to hang.
> Given different system configs it might be better or worse for you.
> 
> FYI we hit this with libvirt 4.0 initially but libvirt 5.0 was just the same.
> I haven't built 5.1 or a recent master, but the commits since 5.0
> didn't mention any issue that seems related. OTOH I'm willing and able
> to build and try suggestions if anyone comes up with ideas.
> 
> [1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096
> [2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/+attachment/5251655/+files/test4.py
> [3]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/3
> [4]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/17
> 

You may want to look at d63c82df8b11b583dec8e72dfb216d8c14783876 
(contained in 5.1.0) beause this smells like the issue you're facing.

Michal




More information about the libvir-list mailing list