[libvirt] deadlock in remoteDispatchDomainUndefine vs daemonStreamHandleAbort
Michal Privoznik
mprivozn at redhat.com
Tue Apr 2 13:30:30 UTC 2019
On 4/2/19 10:45 AM, Christian Ehrhardt wrote:
> On Mon, Apr 1, 2019 at 4:35 PM Michal Privoznik <mprivozn at redhat.com> wrote:
>>
>> On 4/1/19 4:25 PM, Christian Ehrhardt wrote:
>>> Hi,
>>> I happened to analyze a bug [1] report I got from a friend and for
>>> quite a while it was rather elusive. But I now finally got it
>>> reproducible [2] enough to share it with the community.
>>>
>>> The TL;DR of what I see is:
>>> - an automation with python-libvirt gets a SIGINT
>>> - cleanup runs destroy and further undefine
>>> - the guest closes FDs due to SIGINT and/or destroy which triggers
>>> daemonStreamHandleAbort
>>> - those two fight over the lock
>>>
>>> There I get libvirtd into a deadlock which ends up with all threads
>>> dead [4] and two of them fighting [3] (details) in particular.
>>>
>>> The to related stacks summarized are like:
>>>
>>> daemonStreamHandleWrite (failing to write)
>>> -> daemonStreamHandleAbort (closing things and cleaning up)
>>> -> ... virChrdevFDStreamCloseCb
>>> virMutexLock(&priv->devs->lock);
>>>
>>> # there is code meant to avoid such issues emitting "Unable to close"
>>> if a lock is held
>>> # but the log doesn't show this triggering with debug enabled
>>>
>>> #10 seems triggered via an "undefine" call
>>> remoteDispatchDomainUndefine
>>> ... -> virChrdevFree
>>> ... -> virFDStreamSetInternalCloseCb
>>> -> virObjectLock(virFDStreamDataPtr fdst)
>>> -> virMutexLock(&obj->lock);
>>> # closing all streams of a guest (requiring the same locks)
>>>
>>> While that already feels quite close I struggle to see where exactly
>>> we'd want to fix it.
>>> But finally having a repro-script [2] I hope that someone else here
>>> might be able to help me with that.
>>>
>>> After all it is a race - on my s390x system it triggers usually <5
>>> tries, while on x86 I have needed up to 18 runs of the test to hang.
>>> Given different system configs it might be better or worse for you.
>>>
>>> FYI we hit this with libvirt 4.0 initially but libvirt 5.0 was just the same.
>>> I haven't built 5.1 or a recent master, but the commits since 5.0
>>> didn't mention any issue that seems related. OTOH I'm willing and able
>>> to build and try suggestions if anyone comes up with ideas.
>>>
>>> [1]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096
>>> [2]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/+attachment/5251655/+files/test4.py
>>> [3]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/3
>>> [4]: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1822096/comments/17
>>>
>>
>> You may want to look at d63c82df8b11b583dec8e72dfb216d8c14783876
>> (contained in 5.1.0) beause this smells like the issue you're facing.
>
> Thanks Michal,
> I agree that this appears to be similar.
> But unfortunately with 5.0 + the full 9 patch series leading into d63c82df still
> triggers the deadlock that we found.
> So it seems to be a new issue :-/
>
> As I said before - any further suggestions (on commits to test and/or
> how to resolve with new changes) are welcome.
> Thanks in advance!
OKay, how about this then:
diff --git a/src/util/virfdstream.c b/src/util/virfdstream.c
index 1bc43e20a1..0c0bb3ae78 100644
--- a/src/util/virfdstream.c
+++ b/src/util/virfdstream.c
@@ -680,6 +680,9 @@ static int
virFDStreamCloseInt(virStreamPtr st, bool streamAbort)
{
virFDStreamDataPtr fdst;
+ virFDStreamInternalCloseCb icbCb;
+ void *icbOpaque;
+ virFDStreamInternalCloseCbFreeOpaque icbFreeOpaque;
virStreamEventCallback cb;
void *opaque;
int ret;
@@ -730,10 +733,17 @@ virFDStreamCloseInt(virStreamPtr st, bool streamAbort)
/* call the internal stream closing callback */
if (fdst->icbCb) {
- /* the mutex is not accessible anymore, as private data is null */
- (fdst->icbCb)(st, fdst->icbOpaque);
- if (fdst->icbFreeOpaque)
- (fdst->icbFreeOpaque)(fdst->icbOpaque);
+ icbCb = fdst->icbCb;
+ icbOpaque = fdst->icbOpaque;
+ icbFreeOpaque = fdst->icbFreeOpaque;
+
+ virObjectUnlock(fdst);
+
+ (icbCb)(st, icbOpaque);
+ if (icbFreeOpaque)
+ (icbFreeOpaque)(icbOpaque);
+
+ virObjectLock(fdst);
}
if (fdst->dispatching) {
Michal
More information about the libvir-list
mailing list