[libvirt] virsh reconnect vs. keepalive issue

Martin Kletzander mkletzan at redhat.com
Fri May 23 11:29:34 UTC 2014


On Tue, May 20, 2014 at 05:49:54PM -0600, Eric Blake wrote:
>I'm trying to debug a new virsh command, and had two terminals up, one
>running './run gdb tools/virsh' (I originally ran virsh directly, but
>re-ran under gdb to get a trace) and the other running './run gdb
>daemon/libvirtd'.  Because I forgot to use the -k0 argument to disable
>keepalives, and took too long while at a breakpoint on the libvirtd
>terminal, my virsh session tried to report a keepalive failure:
>
>[New Thread 0x7fffe7b8c700 (LWP 7143)]
>2014-05-20 23:40:22.337+0000: 7138: info : libvirt version: 1.2.5
>2014-05-20 23:40:22.337+0000: 7138: warning :
>virKeepAliveTimerInternal:143 : No response from client 0x55555586a520
>after 6 keepalive messages in 35 seconds
>2014-05-20 23:40:22.337+0000: 7143: warning :
>virKeepAliveTimerInternal:143 : No response from client 0x55555586a520
>after 6 keepalive messages in 35 seconds
>error: internal error: received hangup / error event on socket
>

I tried reproducing that and I get no hang, plus the only difference
is that without -k0 I *sometimes* get:

warning : virKeepAliveTimer:177 : Failed to send keepalive request to
client 0x55555583d630

Other than that, everythign works fine and I only see the reconnection
happen after another command is called.  Are you using some kind of
special authentication?

>but then it just hangs there.  A backtrace shows that it is hanging
>trying to reconnect:
>
>(gdb) bt
>#0  0x00007ffff39129dd in poll () at ../sysdeps/unix/syscall-template.S:81
>#1  0x00007ffff798e3c1 in virNetClientIOEventLoop (client=0x55555586ae10,
>    thiscall=0x55555586b090) at rpc/virnetclient.c:1514
>#2  0x00007ffff798ecbf in virNetClientIO (client=0x55555586ae10,
>    thiscall=0x55555586b090) at rpc/virnetclient.c:1786
>#3  0x00007ffff798f5ab in virNetClientSendInternal (client=0x55555586ae10,
>    msg=0x55555586ada0, expectReply=true, nonBlock=false)
>    at rpc/virnetclient.c:1958
>#4  0x00007ffff798f648 in virNetClientSendWithReply (client=0x55555586ae10,
>    msg=0x55555586ada0) at rpc/virnetclient.c:1986
>#5  0x00007ffff7990580 in virNetClientProgramCall (prog=0x55555586afd0,
>    client=0x55555586ae10, serial=0, proc=66, noutfds=0, outfds=0x0,
>    ninfds=0x0, infds=0x0, args_filter=0x7ffff394f270 <__GI_xdr_void>,
>    args=0x0, ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
>    ret=0x7fffffffd6f0) at rpc/virnetclientprogram.c:329
>#6  0x00007ffff7975d4c in callFull (conn=0x555555869fb0,
>priv=0x55555586ab30,
>    flags=0, fdin=0x0, fdinlen=0, fdout=0x0, fdoutlen=0x0, proc_nr=66,
>    args_filter=0x7ffff394f270 <__GI_xdr_void>, args=0x0,
>    ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
>    ret=0x7fffffffd6f0 "") at remote/remote_driver.c:6544
>#7  0x00007ffff7975e25 in call (conn=0x555555869fb0, priv=0x55555586ab30,
>    flags=0, proc_nr=66, args_filter=0x7ffff394f270 <__GI_xdr_void>,
>args=0x0,
>    ret_filter=0x7ffff798152c <xdr_remote_auth_list_ret>,
>---Type <return> to continue, or q <return> to quit---
>    ret=0x7fffffffd6f0 "") at remote/remote_driver.c:6566
>#8  0x00007ffff795f7f1 in remoteAuthenticate (conn=0x555555869fb0,
>    priv=0x55555586ab30, auth=0x7ffff7d71520 <virConnectAuthDefault>,
>    authtype=0x0) at remote/remote_driver.c:3939
>#9  0x00007ffff7957f0a in doRemoteOpen (conn=0x555555869fb0,
>    priv=0x55555586ab30, auth=0x7ffff7d71520 <virConnectAuthDefault>,
>flags=0)
>    at remote/remote_driver.c:973
>#10 0x00007ffff795890b in remoteConnectOpen (conn=0x555555869fb0,
>    auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0)
>    at remote/remote_driver.c:1190
>#11 0x00007ffff78fc5e5 in do_open (name=0x0,
>    auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0) at libvirt.c:1147
>#12 0x00007ffff78fd093 in virConnectOpenAuth (name=0x0,
>    auth=0x7ffff7d71520 <virConnectAuthDefault>, flags=0) at libvirt.c:1390
>#13 0x0000555555575a67 in vshConnect (ctl=0x7fffffffdbc0, uri=0x0,
>    readonly=false) at virsh.c:337
>#14 0x0000555555575c23 in vshReconnect (ctl=0x7fffffffdbc0) at virsh.c:383
>#15 0x000055555557943a in vshCommandRun (ctl=0x7fffffffdbc0,
>    cmd=0x5555557f4900) at virsh.c:1853
>#16 0x000055555557d8de in main (argc=5, argv=0x7fffffffdda8) at virsh.c:3617
>
>It seems odd that virsh is hanging rather than gracefully exit, because
>it is trying too hard to reconnect in spite of being forcefully
>disconnected due to a keepalive timeout.  But I'm not sure what, if
>anything, we could tweak to make this failure mode not require a Ctrl-C
>on my end to get rid of the failed virsh.  It also reiterates the idea
>that we really ought to allow for a user-specified timeout to
>virConnect, rather than blocking forever.
>
>--
>Eric Blake   eblake redhat com    +1-919-301-3266
>Libvirt virtualization library http://libvirt.org
>



>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20140523/823519da/attachment-0001.sig>


More information about the libvir-list mailing list