[libvirt] Segfault in event-test.c example

Matthias Bolte matthias.bolte at googlemail.com
Tue Jan 12 01:33:59 UTC 2010


2010/1/12  <pspreadborough at comcast.net>:
>
> ----- pspreadborough at comcast.net wrote:
>
>> ----- "Matthias Bolte" <matthias.bolte at googlemail.com> wrote:
>>
>> > 2010/1/11  <pspreadborough at comcast.net>:
>> > >
>> > > ----- "Matthias Bolte" <matthias.bolte at googlemail.com> wrote:
>> > >
>> > >> 2010/1/10  <pspreadborough at comcast.net>:
>> > >> >
>> > >> > ----- "Matthias Bolte" <matthias.bolte at googlemail.com> wrote:
>> > >> >
>> > >> >> 2010/1/10  <pspreadborough at comcast.net>:
>> > >> >> >
>> > >> >> > Hello,
>> > >> >> >
>> > >> >> > I have been trying to use the domain event C code example
>> but
>> > >> >> > unfortunately it segfaults (signal 11) every time I run it:
>> > >> >> >
>> > >> >> > [root at Spring events-c]# ./event-test
>> > >> >> > myEventAddHandleFunc:221: Add handle 5 1 0xf081a0 0x8f727f8
>> > >> >> > myEventAddHandleFunc:221: Add handle 7 1 0xf09990 0x8f727f8
>> > >> >> > myEventAddHandleFunc:221: Add handle 8 1 0xed7940 0x8f727f8
>> > >> >> > myEventAddTimeoutFunc:251: Adding Timeout -1 0xedefa0
>> > 0x8f727f8
>> > >> >> > myEventAddHandleFunc:221: Add handle 11 1 0xed7940 0x8f727f8
>> > >> >> > myEventAddTimeoutFunc:251: Adding Timeout -1 0xedefa0
>> > 0x8f727f8
>> > >> >> > main:322 :: Registering domain event cbs
>> > >> >> > Segmentation fault (core dumped)
>> > >> >> >
>> > >> >> >  Core was generated by
>> > >> >> >
>> > >> >>
>> > >>
>> >
>> `/root/libvirt-0.7.5/examples/domain-events/events-c/.libs/lt-event-test'.
>> > >> >> > Program terminated with signal 11, Segmentation fault.
>> > >> >> > [New process 21806]
>> > >> >> > [New process 21822]
>> > >> >> > #0  remoteDomainEventQueueFlush (timer=-1, opaque=0x8f727f8)
>> > at
>> > >> >> > remote/remote_driver.c:8720
>> > >> >> > 8720        tempQueue.count = priv->domainEvents->count;
>> > >> >> > (gdb) bt
>> > >> >> > #0  remoteDomainEventQueueFlush (timer=-1, opaque=0x8f727f8)
>> > at
>> > >> >> > remote/remote_driver.c:8720
>> > >> >> > #1  0x080490d3 in main (argc=Cannot access memory at address
>> > 0x1
>> > >> >> > ) at event-test.c:347
>> > >> >> >
>> > >> >> > The stack looks corrupted so I'm doubtful that this trace if
>> > of
>> > >> much
>> > >> >> value.
>> > >> >> > I have built
>> > >> >> > and installed libvirt-0.7.5 and it and it's tools seem to be
>> > >> >> operating
>> > >> >> > correctly.
>> > >> >>
>> > >> >> I tried the event-test with libvirt-0.7.5 and QEMU/Xen and
>> both
>> > >> are
>> > >> >> working as expected. No segfaults.
>> > >> >>
>> > >> >> Could you inspect the values of priv and priv->domainEvents in
>> > GDB
>> > >> >> using 'p priv' to see if they are NULL and try to dereference
>> > them
>> > >> in
>> > >> >> GDB using 'p *priv' to see if they point to valid memory
>> areas?
>> > >> >>
>> > >> >> Yes the backtrace looks corrupted. If there is stack/heap
>> > >> corruption
>> > >> >> involved valgrind may reveal it, so try to run the event-test
>> > in
>> > >> >> valgrind and see if that gives any hints.
>> > >> >>
>> > >> >> You can also try the GIT version of libvirt. There was a
>> > invalid
>> > >> free
>> > >> >> call (resulting in heap corruption) in the node device code
>> > fixed
>> > >> >> after the 0.7.5 release. But that should have no effect on the
>> > >> >> event-test.
>> > >> >>
>> > >> >> Matthias
>> > >> >
>> > >> > Matthias,
>> > >> >
>> > >> > priv->domainEvents is NULL, here's the gdb output:
>> > >>
>> > >> This explains the segfault. The next question is, why is it NULL?
>> > >>
>> > >> > (gdb) p *priv
>> > >> > $1 = {lock = {lock = {__data = {__lock = 1, __count = 0,
>> __owner
>> > =
>> > >> 21806, __kind = 0, __nusers = 1, {__spins = 0, __list = {
>> > >> >            __next = 0x0}}}, __size =
>> > >>
>> >
>> "\001\000\000\000\000\000\000\000.U\000\000\000\000\000\000\001\000\000\000\000\000\000",
>> > >> >      __align = 1}}, sock = 150469168, watch = 3, pid = 4,
>> > uses_tls =
>> > >> 1982791681, is_secure = 1815048801, session = 0x782f6269,
>> > >>
>> > >> Seeing uses_tls and is_secure being large numbers and knowing
>> that
>> > >> both are used as boolean values in the code and should have
>> values
>> > of
>> > >> 0 or 1 make me think that priv points to already freed memory
>> > here.
>> > >>
>> > >> >  type = 0x2f646e65 <Address 0x2f646e65 out of bounds>, counter
>> =
>> > >> 1684956536, localUses = 1668248365,
>> > >> >  hostname = 0x74656b <Address 0x74656b out of bounds>, debugLog
>> > =
>> > >> 0x0, saslconn = 0x0, saslDecoded = 0x0, saslDecodedLength = 0,
>> > >>
>> > >> type and hostname are char pointers, but the seem to point into
>> > >> nowhere, confirms that this is either freed memory or priv itself
>> > got
>> > >> overwritten due to heap corruption.
>> > >>
>> > >> >  saslDecodedOffset = 0, saslEncoded = 0x0, saslEncodedLength =
>> > 0,
>> > >> saslEncodedOffset = 0,
>> > >> >  buffer = '\0' <repeats 68 times>,
>> > >>
>> >
>> "n\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\bQ�\b����\030\034�\b\000\000\000\000�\033�\b\001\000\000\000\002\000\000\000�\033�\b\000\000\000\000\025|�\000\a",
>> > >> '\0' <repeats 11 times>, "X\000�\b", '\0' <repeats 12 times>,
>> > >> "\021\000\000\000\002\000\000\000P��\b\000\000\000\000\021", '\0'
>> > >> <repeats 15 times>,
>> > >>
>> >
>> "\021\000\000\0008\036�\b\f\000\000\000\020\000\000\000\021\000\000\000\a\000\000\000\b\000\000\000\t\000\000\000\021\000\000\000\002\000\000\000\230\034�\b\000\000\000\000A\000\000\000\003\000\000\000\001\000\000\000\001\000"...,
>> > >> bufferLength = 0,
>> > >> >  bufferOffset = 0, callbackList = 0x0, domainEvents = 0x0,
>> > >> eventFlushTimer = 0, domainEventDispatching = 1, wakeupSendFD =
>> 0,
>> > >> >  wakeupReadFD = 0, waitDispatch = 0x0, streams = 0x0}
>> > >> >
>> > >> > I'll try a run with valgrind and post the results.
>> > >> >
>> > >> > Pete
>> > >> >
>> > >>
>> > >> Could you test the Python version of this example found in
>> > >> examples/domain-events/events-python/event-test.py? Does this
>> > work?
>> > >>
>> > >> Otherwise lets see if valgrind gives any hints.
>> > >>
>> > >> Matthias
>> > >
>> > >
>> > > During initialization I notice that the myEventAddHandleFunc()
>> > method is
>> > > called multiple times, each time with a different fd value (5,7,8
>> > and 11).
>> > > The way the code is written only the last fd value is recorded and
>> > then
>> > > used in the poll() call. Is this the intended? if so why are the
>> > preceding
>> > > fds ignored?
>> > >
>> > > myEventAddHandleFunc:223: Add handle 5 1 0xf13480 0x97b97d8
>> > > myEventAddHandleFunc:223: Add handle 7 1 0xf14c70 0x97b97d8
>> > > Allocating domainEvents:0x97c6b10
>> > > myEventAddHandleFunc:223: Add handle 8 1 0xee2940 0x97b97d8
>> > > myEventAddTimeoutFunc:260: Adding Timeout -1 0xee9fc0 0x97b97d8
>> > > Allocating domainEvents:0x97c5780
>> > > myEventAddHandleFunc:223: Add handle 11 1 0xee2940 0x97b97d8
>> > > myEventAddTimeoutFunc:260: Adding Timeout -1 0xee9fc0 0x97b97d8
>> > > main:333 :: Registering domain event cbs
>> > >
>> > >
>> > > Regards,
>> > >
>> > > Pete
>> > >
>> >
>> > That's strange. I can't reproduce this neither. I always get exactly
>> > one call to myEventAddHandleFunc:
>> >
>> > myEventAddHandleFunc:221: Add handle 3 1 0x7ff116f68b00 0x1d97f00
>> > myEventAddTimeoutFunc:251: Adding Timeout -1 0x7ff116f68750
>> 0x1d97f00
>> > main:322 :: Registering domain event cbs
>> > myEventUpdateHandleFunc:232: Updated Handle 0 0
>> > myEventUpdateHandleFunc:232: Updated Handle 0 1
>> >
>> > You could try to run the event-test in GDB and set a breakpoint on
>> > myEventAddHandleFunc to see where 4 additional calls to
>> > myEventAddHandleFunc come from.
>> >
>> > In my case I get this backtrace when setting a breakpoint on
>> > myEventAddHandleFunc:
>> >
>> > (gdb) bt
>> > #0  myEventAddHandleFunc (fd=6, event=1, cb=0x7f1be6ec3b00
>> > <remoteDomainEventFired>, opaque=0xc68f00, ff=0) at event-test.c:220
>> > #1  0x00007f1be6ecbaaf in doRemoteOpen (conn=0xc68f00,
>> > priv=0x7f1be7350010, auth=0x0, flags=0) at
>> remote/remote_driver.c:893
>> > #2  0x00007f1be6ece053 in remoteOpen (conn=0xc68f00, auth=0x0,
>> > flags=13007744) at remote/remote_driver.c:1076
>> > #3  0x00007f1be6eb155d in do_open (name=0x7fff48400968
>> > "qemu:///system", auth=0x0, flags=0) at libvirt.c:1117
>> > #4  0x0000000000400eb3 in main (argc=<value optimized out>,
>> > argv=<value optimized out>) at event-test.c:313
>> >
>> > Matthias
>>
>>
>> Matthias
>>
>> Here are the four stack traces, one for each time
>> myEventAddHandleFunc() was
>> called.
>>
>> #0  myEventAddHandleFunc (fd=8, event=1, cb=0x85b480
>> <xenStoreWatchEvent>, opaque=0x824a7d8, ff=0) at event-test.c:223
>> #1  0x007ccf55 in virEventAddHandle (fd=8, events=1, cb=0x85b480
>> <xenStoreWatchEvent>, opaque=0x824a7d8, ff=0)
>>     at util/event.c:45
>> #2  0x0085b291 in xenStoreOpen (conn=0x824a7d8, auth=0x0, flags=<value
>> optimized out>) at xen/xs_internal.c:339
>> #3  0x00844287 in xenUnifiedOpen (conn=0x824a7d8, auth=0x0, flags=0)
>> at xen/xen_driver.c:352
>> #4  0x00811d05 in do_open (name=0xbf8c49d4 "xen:///", auth=0x0,
>> flags=0) at libvirt.c:1117
>> #5  0x08048e92 in main (argc=Cannot access memory at address 0x2
>> ) at event-test.c:325
>> (gdb) c
>> Continuing.
>> (gdb) b
>> Note: breakpoint 1 also set at pc 0x8048bc9.
>> Breakpoint 2 at 0x8048bc9: file event-test.c, line 223.
>> (gdb) bt
>> #0  myEventAddHandleFunc (fd=10, event=1, cb=0x85cc70
>> <xenInotifyEvent>, opaque=0x824a7d8, ff=0) at event-test.c:223
>> #1  0x007ccf55 in virEventAddHandle (fd=10, events=1, cb=0x85cc70
>> <xenInotifyEvent>, opaque=0x824a7d8, ff=0) at util/event.c:45
>> #2  0x0085c827 in xenInotifyOpen (conn=0x824a7d8, auth=0x0, flags=0)
>> at xen/xen_inotify.c:460
>> #3  0x008444f1 in xenUnifiedOpen (conn=0x824a7d8, auth=0x0, flags=0)
>> at xen/xen_driver.c:391
>> #4  0x00811d05 in do_open (name=0xbf8c49d4 "xen:///", auth=0x0,
>> flags=0) at libvirt.c:1117
>> #5  0x08048e92 in main (argc=Cannot access memory at address 0x1
>> ) at event-test.c:325
>> (gdb) c
>> Continuing.
>> (gdb) bt
>> #0  myEventAddHandleFunc (fd=11, event=1, cb=0x82a940
>> <remoteDomainEventFired>, opaque=0x824a7d8, ff=0) at event-test.c:223
>> #1  0x007ccf55 in virEventAddHandle (fd=11, events=1, cb=0x82a940
>> <remoteDomainEventFired>, opaque=0x824a7d8, ff=0)
>>     at util/event.c:45
>> #2  0x0082c478 in doRemoteOpen (conn=0x824a7d8, priv=0xb7534008,
>> auth=0x0, flags=0) at remote/remote_driver.c:894
>> #3  0x00830448 in remoteOpenSecondaryDriver (conn=0x824a7d8, auth=0x0,
>> flags=0, priv=0xbf8c2668) at remote/remote_driver.c:1006
>> #4  0x0083082a in remoteNetworkOpen (conn=0x824a7d8, auth=0x0,
>> flags=0) at remote/remote_driver.c:3549
>> #5  0x00811e2f in do_open (name=0xbf8c49d4 "xen:///", auth=0x0,
>> flags=0) at libvirt.c:1137
>> #6  0x08048e92 in main (argc=1, argv=0xbf8c28b4) at event-test.c:325
>> (gdb) c
>> Continuing.
>> (gdb) bt
>> #0  myEventAddHandleFunc (fd=14, event=1, cb=0x82a940
>> <remoteDomainEventFired>, opaque=0x824a7d8, ff=0) at event-test.c:223
>> #1  0x007ccf55 in virEventAddHandle (fd=14, events=1, cb=0x82a940
>> <remoteDomainEventFired>, opaque=0x824a7d8, ff=0)
>>     at util/event.c:45
>> #2  0x0082c478 in doRemoteOpen (conn=0x824a7d8, priv=0x8258ab0,
>> auth=0x0, flags=0) at remote/remote_driver.c:894
>> #3  0x00830448 in remoteOpenSecondaryDriver (conn=0x824a7d8, auth=0x0,
>> flags=0, priv=0xbf8c2668) at remote/remote_driver.c:1006
>> #4  0x0083078a in remoteInterfaceOpen (conn=0x824a7d8, auth=0x0,
>> flags=0) at remote/remote_driver.c:4104
>> #5  0x00811f4e in do_open (name=0xbf8c49d4 "xen:///", auth=0x0,
>> flags=0) at libvirt.c:1156
>> #6  0x08048e92 in main (argc=1, argv=0xbf8c28b4) at event-test.c:325
>>
>> Regards,
>>
>> Pete
>>
>>
>> --
>> Libvir-list mailing list
>> Libvir-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list
>
> Matthias,
>
> Using the UIR xen+unix:/// works! I'm curious to know why when
> using the UIR xen:/// multiple myEventAddHandleFunc() calls occurred.
> Are there and fields I could use to identify and ignore
> the spurious calls?
>
> I'd like to use the event monitoring for several remote Xen hosts.
> I assume I'll have to modify the code to connect remotely to each
> host and hope for just one myEventAddHandleFunc() call per host.
>
> Thanks for your assistance it's much apprecied,
>
> Regards,
>
> Pete
>
>

Ah you are using the event-test from within a dom0. If I do that I can
reproduce the segfault. And I also understand why the Python version
works.

The C version is basically build to handle a single call to
myEventAddHandleFunc. This works with xen+unix:/// because then the
remote driver is involved and only one event handle is added by the
remote driver. With xen:/// several Xen subdrivers register event
handles and myEventAddHandleFunc will just overwrite the values stored
from the previously added event handle. For some reason this results
in the segfault you see. The Python version works because it handles
multiple added handles properly.

I tried to understand why this triggers a segfault, but no success yet.

Matthias




More information about the libvir-list mailing list