[libvirt] Libvirt segfault in qemuMonitorSend() with multi-threaded API use
Adam Litke
agl at us.ibm.com
Fri Mar 5 20:14:07 UTC 2010
Daniel, thanks for the help. I was able to fix the problem (see my post
in a new thread).
On Fri, 2010-03-05 at 09:32 +0000, Daniel P. Berrange wrote:
> On Thu, Mar 04, 2010 at 02:22:35PM -0600, Adam Litke wrote:
> > I have a multi-threaded Python program that shares a single libvirt
> > connection object among several threads (one thread per active domain on
> > the system plus a management thread). On a heavily loaded host with 8
> > running domains I am getting a consistent libvirtd segfault in the qemu
> > monitor handling code. This happens with libvirt-0.7.6 and git.
> >
> > Mar 4 12:23:13 bc1cn7-mgmt kernel: [ 3947.836151] libvirtd[7716]:
> > segfault at 24 ip 000000000045de5c sp 00007fe5aa7d2b20 error 4 in
> > libvirtd[400000+b3000]
> >
> > Using addr2line, this translates to: libvirt/src/qemu/qemu_monitor.c:698
> >
> > Which is in qemuMonitorSend():
> >
> > --> while (!mon->msg->finished) {
> > if (virCondWait(&mon->notify, &mon->lock) < 0)
> > goto cleanup;
> > }
> >
> > It seems that mon->msg is being reset to NULL in the middle of this loop
> > execution. I suspect that is because qemuMonitorSend() is not reentrant
> > and multiple threads in my program are racing here. I would guess the
> > 'mon->msg = NULL;' on line 707 causes the NULL that trips up the other
> > racer.
>
> > I presume the Monitor interface has some locking protection around it to
> > ensure that only one thread can use it at a time?
>
> You are correct that qemuMonitorSend() is not re-entrant. qemuMonitorSend()
> is invoked by any of the qemuMonitorXXXX() APIs. For all these APIs, the
> QEMU driver code is required to first hold the lock by calling
> qemuDomainObjEnterMonitor() and release it when dine with the method
> qemuDomainObjExitMonitor.
>
> eg,
>
> qemuDomainObjEnterMonitor(obj);
> naddrs = qemuMonitorGetAllPCIAddresses(priv->mon,
> &addrs);
> qemuDomainObjExitMonitor(obj);
>
> > Is there an easy way to fix this? I am not familiar with the measures
> > employed to make libvirt thread-safe. Thanks!
>
> The first step is to try to identify which functions were run concurrently
>
> Try running libvirtd with
>
> LIBVIRT_LOG_FILTERS=1:qemu LIBVIRT_LOG_OUTPUTS=1:stderr
>
>
> You'll get quite alot of data printed out for all montor calls which might
> let you see which overlap. You might want to add further log messages in the
> qemuMonitorSend() method itself to help with this.
>
> There is a small chance that using GDB 'thread apply all backtrace' when
> it crashes will show you info, but that's fairly unlikely
>
> The other possibility is buffer corruption in the qemuMonitor struct, but
> that seems less likely
>
> Regards,
> Daniel
--
Thanks,
Adam
More information about the libvir-list
mailing list