[libvirt] [PATCHv3 0/6] Fix memory corruption/crash in the connection close callback

Mon Apr 8 12:06:27 UTC 2013

On 04/08/13 13:55, Viktor Mihajlovski wrote:
> I fear we're yet not thru this. Today I had a segfault doing a migration
> using virsh migrate --verbose --live $guest qemu+ssh://$host/system.
> This is with Friday's git HEAD.
> The migration took very long (but succeeded except for the libvirt
> crash) so there still seems to be a race lingering in the object
> reference counting exposed by the --verbose option (getjobinfo?).
>
> (gdb) bt
> #0  qemuDomainGetJobInfo (dom=<optimized out>, info=0x3fffaaaaa70) at qemu/qemu_driver.c:10166
> #1  0x000003fffd4bbe68 in virDomainGetJobInfo (domain=0x3ffe4002660, info=0x3fffaaaaa70) at libvirt.c:17440
> #2  0x000002aace36b528 in remoteDispatchDomainGetJobInfo (server=<optimized out>, msg=<optimized out>, ret=0x3ffe40029d0,
>      args=0x3ffe40026a0, rerr=0x3fffaaaac20, client=<optimized out>) at remote_dispatch.h:2069
> #3  remoteDispatchDomainGetJobInfoHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>,
>      rerr=0x3fffaaaac20, args=0x3ffe40026a0, ret=0x3ffe40029d0) at remote_dispatch.h:2045
> #4  0x000003fffd500384 in virNetServerProgramDispatchCall (msg=0x2ab035dd800, client=0x2ab035df5d0, server=0x2ab035ca370,
>      prog=0x2ab035cf210) at rpc/virnetserverprogram.c:439
> #5  virNetServerProgramDispatch (prog=0x2ab035cf210, server=0x2ab035ca370, client=0x2ab035df5d0, msg=0x2ab035dd800)
>      at rpc/virnetserverprogram.c:305
> #6  0x000003fffd4fad3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>,
>      srv=0x2ab035ca370) at rpc/virnetserver.c:162
> #7  virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x2ab035ca370) at rpc/virnetserver.c:183
> #8  0x000003fffd42a91c in virThreadPoolWorker (opaque=opaque at entry=0x2ab035a9e60) at util/virthreadpool.c:144
> #9  0x000003fffd42a236 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161
> #10 0x000003fffcdee412 in start_thread () from /lib64/libpthread.so.0
> #11 0x000003fffcd30056 in thread_start () from /lib64/libc.so.6
>
> (gdb) l
> 10161	    if (!(vm = qemuDomObjFromDomain(dom)))
> 10162	        goto cleanup;
> 10163	
> 10164	    priv = vm->privateData;
> 10165	
> 10166	    if (virDomainObjIsActive(vm)) {
> 10167	        if (priv->job.asyncJob && !priv->job.dump_memory_only) {
> 10168	            memcpy(info, &priv->job.info, sizeof(*info));
> 10169	
> 10170	            /* Refresh elapsed time again just to ensure it
>
>
> (gdb) print *vm
> $1 = {parent = {parent = {magic = 3735928559, refs = 0, klass = 0xdeadbeef}, lock = {lock = {__data = {__lock = 0,
>            __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>          __size = '\000' <repeats 39 times>, __align = 0}}}, pid = 0, state = {state = 0, reason = 0}, autostart = 0,
>    persistent = 0, updated = 0, def = 0x0, newDef = 0x0, snapshots = 0x0, current_snapshot = 0x0, hasManagedSave = false,
>    privateData = 0x0, privateDataFreeFunc = 0x0, taint = 0}
>
> I am currently blocked with other work but if anyone has a theory that
> I should verify let me know...
>

Aiee, perhaps a race between a thread freeing a domain object (and the 
private data) and another thread that happened to acquire the domain 
object pointer before it was freed? Let me verify if that is possible.

Peter