[dm-devel] multipathd: locks itself in udev trigger
Alban Browaeys
prahal at yahoo.com
Thu Apr 6 12:24:07 UTC 2017
Bcache backing partition bcache0 triggers an udev add event that is handled by multipathd.
Somewhat the other "bare" paritions sda<n> do not.
The issue is when this event triggers the thread lock itself since commit
c6a18f4541d0a161e2f5fed8c67d9732bf512b37 "fix INIT_REQUESTED_UDEV code" .
This change in "uev_update_path" moved "uev_add_path(uev, vecs);" under the fast lock (non recursive)
"lock(&vecs->lock);". As uev_add_path too calls "lock(&vecs->lock);" multipathd hangs in this second call
in the same thread.
Then "multipathd list paths" or other multipathd commands returns timeout.
This also postpone systemd shutdown/reboot by a minute while it waits for multipathd service to stop.
The backtrace was:
(gdb) t a a bt
Thread 6 (Thread 0x7f922663c700 (LWP 545)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f9228bd3602 in ?? () from /usr/lib/x86_64-linux-gnu/liburcu.so.4
#2 0x00007f92289ba424 in start_thread (arg=0x7f922663c700) at pthread_create.c:333
#3 0x00007f922825c9bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 5 (Thread 0x7f9229734700 (LWP 543)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000556fedcbe42d in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
#3 uev_add_path (vecs=0x556fefc43080, uev=<optimized out>, uev=<optimized out>) at main.c:627
#4 0x0000556fedcbe9c9 in uev_update_path (uev=0x7f9220001510, vecs=0x556fefc43080) at main.c:998
#5 0x0000556fedcbecdb in uev_trigger (uev=0x7f9220001510, trigger_data=0x556fefc43080) at main.c:1146
#6 0x00007f92292091b2 in service_uevq (tmpq=tmpq at entry=0x7f9229733b10) at uevent.c:89
#7 0x00007f9229209280 in uevent_dispatch (uev_trigger=<optimized out>, trigger_data=<optimized out>) at uevent.c:145
#8 0x0000556fedcbc2cc in uevqloop (ap=0x556fefc43080) at main.c:1177
#9 0x00007f92289ba424 in start_thread (arg=0x7f9229734700) at pthread_create.c:333
#10 0x00007f922825c9bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 4 (Thread 0x7f9229745700 (LWP 542)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f92289bcb85 in __GI___pthread_mutex_lock (mutex=0x556fefc43080) at ../nptl/pthread_mutex_lock.c:80
#2 0x0000556fedcbfb45 in lock (a=0x556fefc43080) at ../libmultipath/lock.h:12
#3 checkerloop (ap=0x556fefc43080) at main.c:1827
#4 0x00007f92289ba424 in start_thread (arg=0x7f9229745700) at pthread_create.c:333
#5 0x00007f922825c9bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 3 (Thread 0x7f9229810700 (LWP 541)):
#0 0x00007f9228253611 in __GI_ppoll (fds=0x7f92180021e0, nfds=nfds at entry=1, timeout=<optimized out>, timeout at entry=0x556fedecc020 <sleep_time>, sigmask=sigmask at entry=0x7f922980fa60) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1 0x0000556fedcc13ba in ppoll (__ss=0x7f922980fa60, __timeout=0x556fedecc020 <sleep_time>, __nfds=1, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 uxsock_listen (uxsock_trigger=0x556fedcbb520 <uxsock_trigger>, trigger_data=0x556fefc43080) at uxlsnr.c:204
#3 0x0000556fedcbbd5a in uxlsnrloop (ap=0x556fefc43080) at main.c:1239
#4 0x00007f92289ba424 in start_thread (arg=0x7f9229810700) at pthread_create.c:333
#5 0x00007f922825c9bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 2 (Thread 0x7f9229851700 (LWP 540)):
#0 0x00007f922825354d in poll () at ../sysdeps/unix/syscall-template.S:84
#1 0x00007f9229209f3a in poll (__timeout=<optimized out>, __nfds=1, __fds=0x7f9229850a88) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2 uevent_listen (udev=0x556fefbec040) at uevent.c:515
#3 0x0000556fedcbc235 in ueventloop (ap=0x556fefbec040) at main.c:1166
#4 0x00007f92289ba424 in start_thread (arg=0x7f9229851700) at pthread_create.c:333
#5 0x00007f922825c9bf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 1 (Thread 0x7f9229746f00 (LWP 537)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000556fedcc0aba in child (param=<optimized out>) at main.c:2407
#2 0x0000556fedcbb0df in main (argc=<optimized out>, argv=0x7fff81f9a0d8) at main.c:2664
As a local workaround I moved "uev_add_path" in "uev_update_path" back out of the lock umbrella
while I keep it under pp->initialized check.
https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=859157;filename=fix_uev_update_path_udevadd_recursive_lock_deadlock.diff;msg=5
This change fixes the reboot delay but I have no multipath setup thus cannot detect any regressions.
-Alban
More information about the dm-devel
mailing list