[libvirt] [PATCHv3 0/6] Fix memory corruption/crash in the connection close callback

Peter Krempa pkrempa at redhat.com
Mon Apr 8 12:06:27 UTC 2013


On 04/08/13 13:55, Viktor Mihajlovski wrote:
> I fear we're yet not thru this. Today I had a segfault doing a migration
> using virsh migrate --verbose --live $guest qemu+ssh://$host/system.
> This is with Friday's git HEAD.
> The migration took very long (but succeeded except for the libvirt
> crash) so there still seems to be a race lingering in the object
> reference counting exposed by the --verbose option (getjobinfo?).
>
> (gdb) bt
> #0  qemuDomainGetJobInfo (dom=<optimized out>, info=0x3fffaaaaa70) at qemu/qemu_driver.c:10166
> #1  0x000003fffd4bbe68 in virDomainGetJobInfo (domain=0x3ffe4002660, info=0x3fffaaaaa70) at libvirt.c:17440
> #2  0x000002aace36b528 in remoteDispatchDomainGetJobInfo (server=<optimized out>, msg=<optimized out>, ret=0x3ffe40029d0,
>      args=0x3ffe40026a0, rerr=0x3fffaaaac20, client=<optimized out>) at remote_dispatch.h:2069
> #3  remoteDispatchDomainGetJobInfoHelper (server=<optimized out>, client=<optimized out>, msg=<optimized out>,
>      rerr=0x3fffaaaac20, args=0x3ffe40026a0, ret=0x3ffe40029d0) at remote_dispatch.h:2045
> #4  0x000003fffd500384 in virNetServerProgramDispatchCall (msg=0x2ab035dd800, client=0x2ab035df5d0, server=0x2ab035ca370,
>      prog=0x2ab035cf210) at rpc/virnetserverprogram.c:439
> #5  virNetServerProgramDispatch (prog=0x2ab035cf210, server=0x2ab035ca370, client=0x2ab035df5d0, msg=0x2ab035dd800)
>      at rpc/virnetserverprogram.c:305
> #6  0x000003fffd4fad3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>,
>      srv=0x2ab035ca370) at rpc/virnetserver.c:162
> #7  virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x2ab035ca370) at rpc/virnetserver.c:183
> #8  0x000003fffd42a91c in virThreadPoolWorker (opaque=opaque at entry=0x2ab035a9e60) at util/virthreadpool.c:144
> #9  0x000003fffd42a236 in virThreadHelper (data=<optimized out>) at util/virthreadpthread.c:161
> #10 0x000003fffcdee412 in start_thread () from /lib64/libpthread.so.0
> #11 0x000003fffcd30056 in thread_start () from /lib64/libc.so.6
>
> (gdb) l
> 10161	    if (!(vm = qemuDomObjFromDomain(dom)))
> 10162	        goto cleanup;
> 10163	
> 10164	    priv = vm->privateData;
> 10165	
> 10166	    if (virDomainObjIsActive(vm)) {
> 10167	        if (priv->job.asyncJob && !priv->job.dump_memory_only) {
> 10168	            memcpy(info, &priv->job.info, sizeof(*info));
> 10169	
> 10170	            /* Refresh elapsed time again just to ensure it
>
>
> (gdb) print *vm
> $1 = {parent = {parent = {magic = 3735928559, refs = 0, klass = 0xdeadbeef}, lock = {lock = {__data = {__lock = 0,
>            __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},
>          __size = '\000' <repeats 39 times>, __align = 0}}}, pid = 0, state = {state = 0, reason = 0}, autostart = 0,
>    persistent = 0, updated = 0, def = 0x0, newDef = 0x0, snapshots = 0x0, current_snapshot = 0x0, hasManagedSave = false,
>    privateData = 0x0, privateDataFreeFunc = 0x0, taint = 0}
>
> I am currently blocked with other work but if anyone has a theory that
> I should verify let me know...
>

Aiee, perhaps a race between a thread freeing a domain object (and the 
private data) and another thread that happened to acquire the domain 
object pointer before it was freed? Let me verify if that is possible.

Peter




More information about the libvir-list mailing list