[libvirt] deadlock in remoteDispatchDomainUndefine vs daemonStreamHandleAbort

Michal Privoznik mprivozn at redhat.com
Tue Apr 2 13:30:30 UTC 2019


On 4/2/19 10:45 AM, Christian Ehrhardt wrote:
> On Mon, Apr 1, 2019 at 4:35 PM Michal Privoznik <mprivozn at redhat.com> wrote:
>>
>> On 4/1/19 4:25 PM, Christian Ehrhardt wrote:
>>> Hi,
>>> I happened to analyze a bug [1] report I got from a friend and for
>>> quite a while it was rather elusive. But I now finally got it
>>> reproducible [2] enough to share it with the community.
>>>
>>> The TL;DR of what I see is:
>>> - an automation with python-libvirt gets a SIGINT
>>> - cleanup runs destroy and further undefine
>>> - the guest closes FDs due to SIGINT and/or destroy which triggers
>>> daemonStreamHandleAbort
>>> - those two fight over the lock
>>>
>>> There I get libvirtd into a deadlock which ends up with all threads
>>> dead [4] and two of them fighting [3] (details) in particular.
>>>
>>> The to related stacks summarized are like:
>>>
>>> daemonStreamHandleWrite (failing to write)
>>>    -> daemonStreamHandleAbort (closing things and cleaning up)
>>>       -> ... virChrdevFDStreamCloseCb
>>>           virMutexLock(&priv->devs->lock);
>>>
>>> # there is code meant to avoid such issues emitting "Unable to close"
>>> if a lock is held
>>> # but the log doesn't show this triggering with debug enabled
>>>
>>> #10 seems triggered via an "undefine" call
>>>     remoteDispatchDomainUndefine
>>>     ... -> virChrdevFree
>>>        ... -> virFDStreamSetInternalCloseCb
>>>           -> virObjectLock(virFDStreamDataPtr fdst)
>>>             -> virMutexLock(&obj->lock);
>>>     # closing all streams of a guest (requiring the same locks)
>>>
>>> While that already feels quite close I struggle to see where exactly
>>> we'd want to fix it.
>>> But finally having a repro-script [2] I hope that someone else here
>>> might be able to help me with that.
>>>
>>> After all it is a race - on my s390x system it triggers usually <5
>>> tries, while on x86 I have needed up to 18 runs of the test to hang.
>>> Given different system configs it might be better or worse for you.
>>>
>>> FYI we hit this with libvirt 4.0 initially but libvirt 5.0 was just the same.
>>> I haven't built 5.1 or a recent master, but the commits since 5.0
>>> didn't mention any issue that seems related. OTOH I'm willing and able
>>> to build and try suggestions if anyone comes up with ideas.
>>>
>>> [1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096
>>> [2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/+attachment/5251655/+files/test4.py
>>> [3]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/3
>>> [4]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/17
>>>
>>
>> You may want to look at d63c82df8b11b583dec8e72dfb216d8c14783876
>> (contained in 5.1.0) beause this smells like the issue you're facing.
> 
> Thanks Michal,
> I agree that this appears to be similar.
> But unfortunately with 5.0 + the full 9 patch series leading into d63c82df still
> triggers the deadlock that we found.
> So it seems to be a new issue :-/
> 
> As I said before - any further suggestions (on commits to test and/or
> how to resolve with new changes) are welcome.
> Thanks in advance!


OKay, how about this then:

diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c
index 1bc43e20a1..0c0bb3ae78 100644
--- a/src/util/virfdstream.c
+++ b/src/util/virfdstream.c
@@ -680,6 +680,9 @@ static int
 virFDStreamCloseInt(virStreamPtr st, bool streamAbort)
 {
     virFDStreamDataPtr fdst;
+    virFDStreamInternalCloseCb icbCb;
+    void *icbOpaque;
+    virFDStreamInternalCloseCbFreeOpaque icbFreeOpaque;
     virStreamEventCallback cb;
     void *opaque;
     int ret;
@@ -730,10 +733,17 @@ virFDStreamCloseInt(virStreamPtr st, bool streamAbort)
 
     /* call the internal stream closing callback */
     if (fdst->icbCb) {
-        /* the mutex is not accessible anymore, as private data is null */
-        (fdst->icbCb)(st, fdst->icbOpaque);
-        if (fdst->icbFreeOpaque)
-            (fdst->icbFreeOpaque)(fdst->icbOpaque);
+        icbCb = fdst->icbCb;
+        icbOpaque = fdst->icbOpaque;
+        icbFreeOpaque = fdst->icbFreeOpaque;
+
+        virObjectUnlock(fdst);
+
+        (icbCb)(st, icbOpaque);
+        if (icbFreeOpaque)
+            (icbFreeOpaque)(icbOpaque);
+
+        virObjectLock(fdst);
     }
 
     if (fdst->dispatching) {



Michal




More information about the libvir-list mailing list