[dm-devel] [PATCH] multipathd: avoid crash in uevent_cleanup()

lixiaokeng lixiaokeng at huawei.com
Wed Feb 3 10:48:48 UTC 2021



On 2021/2/3 4:52, Martin Wilck wrote:
> did this fix your "crash on exit" issue?

Unfortunately, the issue is not solved.


There are 100 luns and two scripts to reproduce the issue.

#!/bin/bash
while true
do
        for i in `seq 1 20`
        do
                udevadm trigger
        done
        sleep 1
done

#!/bin/bash
while true
do
        for i in `seq 1 10`
        do
                systemctl restart multipathd
        done
	kill -9 `pidof /sbin/multipathd`
	sleep 1
done

There will be some different coredump stack.

0.8.5 (I'm not sure there are only two stacks in 0.8.5)
First stack:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f59a0109647 in ?? ()
[Current thread is 1 (LWP 1997690)]
(gdb) bt
#0  0x00007f59a0109647 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) info threads
  Id   Target Id         Frame
* 1    LWP 1997690       0x00007f59a0109647 in ?? ()
  2    LWP 1996840       0x00007f59a0531de7 in ?? ()
  3    LWP 1997692       0x00007f59a0109647 in ?? ()
  4    LWP 1996857       0x00007f59a020d169 in ?? ()

Second stack:
#0  0x0000ffffb6118f4c in aarch64_fallback_frame_state (context=0xffffb523f200, context=0xffffb523f200, fs=0xffffb523e700) at ./md-unwind-support.h:74
#1  uw_frame_state_for (context=context at entry=0xffffb523f200, fs=fs at entry=0xffffb523e700) at ../../../libgcc/unwind-dw2.c:1257
#2  0x0000ffffb6119ef4 in _Unwind_ForcedUnwind_Phase2 (exc=exc at entry=0xffffb52403b0, context=context at entry=0xffffb523f200) at ../../../libgcc/unwind.inc:155
#3  0x0000ffffb611a284 in _Unwind_ForcedUnwind (exc=0xffffb52403b0, stop=stop at entry=0xffffb64846c0 <unwind_stop>, stop_argument=0xffffb523f630) at ../../../libgcc/unwind.inc:207
#4  0x0000ffffb6484860 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#5  0x0000ffffb6482d08 in __do_cancel () at pthreadP.h:304
#6  __GI___pthread_testcancel () at pthread_testcancel.c:26
#7  0x0000ffffb5c528e8 in ?? ()

There are other stacks in 0.7.7
Second stack:
#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x0000ffff9d8d281c in __GI_abort () at abort.c:79
#2  0x0000ffff9d90b818 in __libc_message (action=action at entry=do_abort, fmt=fmt at entry=0xffff9d9cb888 "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3  0x0000ffff9d911f6c in malloc_printerr (str=str at entry=0xffff9d9c90d0 "free(): invalid pointer") at malloc.c:5389
#4  0x0000ffff9d913780 in _int_free (av=0xffff9da0ba58 <main_arena>, p=0xffff98000070, have_lock=0) at malloc.c:4172
#5  0x0000ffff9dc2b608 in internal_hashmap_clear (h=h at entry=0xffff9814dfa0, default_free_key=<optimized out>, default_free_value=<optimized out>) at ../src/basic/hashmap.c:902
#6  0x0000ffff9dc2b700 in internal_hashmap_free (h=<optimized out>, default_free_key=<optimized out>, default_free_value=<optimized out>, default_free_value=<optimized out>, default_free_key=<optimized out>,
    h=<optimized out>) at ../src/basic/hashmap.c:874
#7  0x0000ffff9dc2b88c in ordered_hashmap_free_free_free () at ../src/basic/hashmap.h:118
#8  device_free (device=0xffff9814d420) at ../src/libsystemd/sd-device/sd-device.c:68
#9  sd_device_unref (p=<optimized out>) at ../src/libsystemd/sd-device/sd-device.c:78
#10 0x0000ffff9dc36cc8 in sd_device_unrefp () at ../src/systemd/sd-device.h:118
#11 device_new_from_nulstr (len=<optimized out>, nulstr=0xffff9d3253d0 "SEQNUM", ret=<synthetic pointer>) at ../src/libsystemd/sd-device/device-private.c:448
#12 device_monitor_receive_device (m=0xffff98000b20, ret=ret at entry=0xffff9d327388) at ../src/libsystemd/sd-device/device-monitor.c:447
#13 0x0000ffff9dc38bf4 in udev_monitor_receive_sd_device (ret=0xffff9d327388, udev_monitor=0xffff98000c70) at ../src/libudev/libudev-monitor.c:207
#14 udev_monitor_receive_device (udev_monitor=0xffff98000c70, udev_monitor at entry=0xffff9d3273a0) at ../src/libudev/libudev-monitor.c:253
#15 0x0000ffff9dcd9478 in uevent_listen (udev=0xffff9d327f40) at uevent.c:853
#16 0x0000aaaae3783514 in ueventloop (ap=0xffffe160fe80) at main.c:1518
#17 0x0000ffff9dbb67ac in start_thread (arg=0xffff9d3ad380) at pthread_create.c:486
#18 0x0000ffff9d97047c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Some stacks in glibc and libgcc.

If exit() before all pthread_cancel in child of 0.7.7, there is no any crash.
So I believe there are many races in thread cancel but I don't know how it comes.

Regards,
Lixiaokeng




More information about the dm-devel mailing list