[libvirt] nwfilter deadlock with libpcap and TPACKET_V3

Daniel P. Berrangé berrange at redhat.com
Mon May 21 12:31:25 UTC 2018


On Sat, May 19, 2018 at 09:43:49PM +0200, Martin Kletzander wrote:
> On Fri, May 18, 2018 at 05:30:00PM +0100, Daniel P. Berrangé wrote:
> > I've just found a deadlock in nwfilter caused by something in libpcap.
> > 
> > There are 2 VMs running with the nwfilter stress test, but the deadlock
> > in fact only involves VM VM.
> > 
> > Three threads in libvirtd
> > 
> > Thread 1 (Thread 0x7f3a26f726c0 (LWP 15384)):
> > #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > #1  0x00007f3a22ae1cc4 in __GI___pthread_mutex_lock (mutex=0x7f39fc023de0) at ../nptl/pthread_mutex_lock.c:78
> > #2  0x00007f3a264817f5 in virMutexLock (m=<optimized out>) at util/virthread.c:89
> > #3  0x00007f3a2645c34b in virObjectLock (anyobj=anyobj at entry=0x7f39fc023dd0) at util/virobject.c:429
> > #4  0x00007f39deca2b93 in qemuProcessHandleEvent (mon=<optimized out>, vm=0x7f39fc023dd0, eventName=0x5601c3940500 "SHUTDOWN",
> >    seconds=1526658178, micros=453867, details=0x5601c3a468b0 "{\"guest\":false}", opaque=0x7f39ac11ba20) at qemu/qemu_process.c:630
> > #5  0x00007f39decc2686 in qemuMonitorEmitEvent (mon=mon at entry=0x7f39fc049d70, event=event at entry=0x5601c3940500 "SHUTDOWN",
> >    seconds=1526658178, micros=453867, details=0x5601c3a468b0 "{\"guest\":false}") at qemu/qemu_monitor.c:1336
> > #6  0x00007f39decd6cce in qemuMonitorJSONIOProcessEvent (obj=0x5601c391fcc0, mon=0x7f39fc049d70) at qemu/qemu_monitor_json.c:174
> > #7  qemuMonitorJSONIOProcessLine (mon=mon at entry=0x7f39fc049d70, line=<optimized out>, msg=msg at entry=0x0)
> >    at qemu/qemu_monitor_json.c:211
> > #8  0x00007f39decd6f59 in qemuMonitorJSONIOProcess (mon=mon at entry=0x7f39fc049d70,
> >    data=0x5601c3964670 "{\"timestamp\": {\"seconds\": 1526658178, \"microseconds\": 453867}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, msg=msg at entry=0x0) at qemu/qemu_monitor_json.c:253
> > #9  0x00007f39decc11fc in qemuMonitorIOProcess (mon=0x7f39fc049d70) at qemu/qemu_monitor.c:446
> > #10 qemuMonitorIO (watch=watch at entry=323, fd=<optimized out>, events=0, events at entry=1, opaque=opaque at entry=0x7f39fc049d70)
> >    at qemu/qemu_monitor.c:706
> > #11 0x00007f3a26425743 in virEventPollDispatchHandles (fds=0x5601c39a3410, nfds=<optimized out>) at util/vireventpoll.c:508
> > #12 virEventPollRunOnce () at util/vireventpoll.c:657
> > #13 0x00007f3a26423e71 in virEventRunDefaultImpl () at util/virevent.c:327
> > #14 0x00007f3a26549ff5 in virNetDaemonRun (dmn=0x5601c38cd1b0) at rpc/virnetdaemon.c:850
> > #15 0x00005601c281544a in main (argc=<optimized out>, argv=<optimized out>) at remote/remote_daemon.c:1483
> > 
> > 
> > 
> > Thread 4 (Thread 0x7f3a0e011700 (LWP 15400)):
> > #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> > #1  0x00007f3a22ae1d64 in __GI___pthread_mutex_lock (mutex=0x7f39ac1cb3e0) at ../nptl/pthread_mutex_lock.c:113
> > #2  0x00007f3a264817f5 in virMutexLock (m=<optimized out>) at util/virthread.c:89
> > ---Type <return> to continue, or q <return> to quit---
> > #3  0x00007f39f518653f in virNWFilterLockIface (ifname=ifname at entry=0x7f39fc025b60 "tck-vm2-if0")
> >    at nwfilter/nwfilter_learnipaddr.c:197
> > #4  0x00007f39f517aae5 in _virNWFilterTeardownFilter (ifname=0x7f39fc025b60 "tck-vm2-if0") at nwfilter/nwfilter_gentech_driver.c:1021
> > #5  0x00007f39f517b5fc in virNWFilterTeardownFilter (net=0x7f39fc044f80) at nwfilter/nwfilter_gentech_driver.c:1039
> > #6  0x00007f3a264da77c in virDomainConfVMNWFilterTeardown (vm=vm at entry=0x7f39fc023dd0) at conf/domain_nwfilter.c:65
> > #7  0x00007f39decab77a in qemuProcessStop (driver=driver at entry=0x7f39ac11ba20, vm=0x7f39fc023dd0,
> >    reason=reason at entry=VIR_DOMAIN_SHUTOFF_DESTROYED, asyncJob=asyncJob at entry=QEMU_ASYNC_JOB_NONE, flags=0)
> >    at qemu/qemu_process.c:6767
> > #8  0x00007f39decfd47e in qemuDomainDestroyFlags (dom=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:2237
> > #9  0x00007f3a2661c593 in virDomainDestroy (domain=domain at entry=0x7f39f00191e0) at libvirt-domain.c:475
> > #10 0x00005601c284052a in remoteDispatchDomainDestroy (server=0x5601c38cd240, msg=0x5601c3a189e0, args=<optimized out>,
> >    rerr=0x7f3a0e010b60, client=0x5601c393d9f0) at remote/remote_daemon_dispatch_stubs.h:4615
> > #11 remoteDispatchDomainDestroyHelper (server=0x5601c38cd240, client=0x5601c393d9f0, msg=0x5601c3a189e0, rerr=0x7f3a0e010b60,
> >    args=<optimized out>, ret=0x7f39f0004970) at remote/remote_daemon_dispatch_stubs.h:4591
> > #12 0x00007f3a265443f4 in virNetServerProgramDispatchCall (msg=0x5601c3a189e0, client=0x5601c393d9f0, server=0x5601c38cd240,
> >    prog=0x5601c3938d70) at rpc/virnetserverprogram.c:437
> > #13 virNetServerProgramDispatch (prog=0x5601c3938d70, server=server at entry=0x5601c38cd240, client=0x5601c393d9f0, msg=0x5601c3a189e0)
> >    at rpc/virnetserverprogram.c:304
> > #14 0x00007f3a2654a778 in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>,
> >    srv=0x5601c38cd240) at rpc/virnetserver.c:145
> > #15 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5601c38cd240) at rpc/virnetserver.c:166
> > #16 0x00007f3a264822f0 in virThreadPoolWorker (opaque=opaque at entry=0x5601c38d9da0) at util/virthreadpool.c:167
> > #17 0x00007f3a26481678 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
> > #18 0x00007f3a22adf564 in start_thread (arg=<optimized out>) at pthread_create.c:463
> > #19 0x00007f3a2281331f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > 
> > 
> > 
> > Thread 19 (Thread 0x7f39ceacb700 (LWP 17216)):
> > #0  0x00007f3a22808929 in __GI___poll (fds=fds at entry=0x7f39ceaca930, nfds=nfds at entry=1, timeout=-1)
> >    at ../sysdeps/unix/sysv/linux/poll.c:29
> > #1  0x00007f39f4f3409d in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f39ceaca930) at /usr/include/bits/poll2.h:46
> > #2  pcap_wait_for_frames_mmap (handle=handle at entry=0x7f39ac1a7260) at ./pcap-linux.c:4561
> > #3  0x00007f39f4f38b98 in pcap_read_linux_mmap_v3 (handle=0x7f39ac1a7260, max_packets=1,
> >    callback=0x7f39f4f339f0 <pcap_oneshot_mmap>, user=0x7f39ceaca9d0 "@\252\254\316\071\177") at ./pcap-linux.c:5059
> > #4  0x00007f39f4f3d632 in pcap_next (p=p at entry=0x7f39ac1a7260, h=h at entry=0x7f39ceacaa40) at ./pcap.c:243
> > #5  0x00007f39f5186ace in learnIPAddressThread (arg=0x7f39fc058f10) at nwfilter/nwfilter_learnipaddr.c:486
> > #6  0x00007f3a264816a2 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
> > #7  0x00007f3a22adf564 in start_thread (arg=<optimized out>) at pthread_create.c:463
> > #8  0x00007f3a2281331f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > 
> > 
> > So in summary:
> > 
> >   Thread 1 is responding to QEMU  SHUTDOWN event
> >      Waiting on lock VM mutex due to thread 4
> > 
> >   Thread 4 is responding to virDomainDestroy
> >      Holds lock on VM mutex
> >      Holds lock on nwfilter update mutex
> >      Waiting on lock interface mutex due to thread 19
> > 
> >   Thread 19 is doing nwfilter address learning
> >      Holds lock on interface mutex
> > 
> > Thread 19 is using libpcap to capture packets, and has told it to use a
> > 500ms timeout. So it is in  pcap_next() while holding the interface
> > lock waiting in poll().  The code is written such that pcap_next() is
> > expected to return on hitting the 500ms timeout and check for a request
> > for learning to terminate. Whereupon it would release the interface lock
> > allowing thread 4 to continnue.
> > 
> > As we can see from the stack trace though, pcap_next() is not honouring
> > our 500ms timeout, and has done poll() with an infinite timeout.
> > 
> > This seems to be intentional behaviour from libpcap to ignore our requested
> > timeout....
> > 
> > 	} else if (handlep->timeout > 0) {
> > #ifdef HAVE_TPACKET3
> > 		/*
> > 		 * For TPACKET_V3, the timeout is handled by the kernel,
> > 		 * so block forever; that way, we don't get extra timeouts.
> > 		 * Don't do that if we have a broken TPACKET_V3, though.
> > 		 */
> > 		if (handlep->tp_version == TPACKET_V3 && !broken_tpacket_v3)
> > 			handlep->poll_timeout = -1;	/* block forever, let TPACKET_V3 wake us up */
> > 		else
> > #endif
> > 			handlep->poll_timeout = handlep->timeout;	/* block for that amount of time */
> > 
> > 
> > We've taken the path tp_version == TPACKET_V3 && !broken_tpacket_v3.
> > 
> > This is the case for anything with kernel >= 3.19
> > 
> > 	/*
> > 	 * OK, that was a valid minor version.
> > 	 * Is this 3.19 or newer?
> > 	 */
> > 	if (major >= 4 || (major == 3 && minor >= 19)) {
> > 		/* Yes. TPACKET_V3 works correctly. */
> > 		broken_tpacket_v3 = 0;
> > 	}
> > 
> > I've no clue what it is about TPACKET_V3 that is supposed to wake us
> > up from poll, but it clearly isn't happening :-(
> > 
> 
> Sorry, me neither, but I remembered something about timeouts being fishy in pcap
> back when I used it.  And looking at the man page now I surely was right:
> 
>  NOTE: the read timeout cannot be used to cause calls that read packets to return within a
>  limited period of time, because, on some platforms, the  read  timeout  isn't  supported,
>  and, on other platforms, the timer doesn't start until at least one packet arrives.  This
>  means that the read timeout should NOT be used, for example, in an  interactive  applica‐
>  tion to allow the packet capture loop to ``poll'' for user input periodically, as there's
>  no guarantee that a call reading packets will return after the timeout expires even if no
>  packets have arrived.
> 
> I'm not sure why in your particular case the timeout is just not set on purpose,
> but maybe that excerpt might help.

Luckily I discovered that we can take pcap out of the equation by asking
for the underlying FD and running poll() ourselves, whereupon we can have
the timeout we want again.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




More information about the libvir-list mailing list