[dm-devel] [PATCH] multipathd: avoid crash in uevent_cleanup()

lixiaokeng lixiaokeng at huawei.com
Sun Feb 7 07:05:21 UTC 2021



On 2021/2/5 19:08, Martin Wilck wrote:
> On Thu, 2021-02-04 at 16:06 +0100, Martin Wilck wrote:
>> On Thu, 2021-02-04 at 09:40 +0800, lixiaokeng wrote:
>>>
>>>
>>> On 2021/2/3 21:57, Martin Wilck wrote:
>>>>> If exit() before all pthread_cancel in child of 0.7.7, there is
>>>>> no
>>>>> any crash.
>>>> What do you mean with "exit() before all pthread_cancel"? If this
>>>> happens on pthread_cancel(), and you don't call that function,
>>>> this
>>>> would actually be expected.
>>>
>>> When running_state is DAEMON_SHUTDOWN, break while then _exit(0).
>>> But
>>> is is not a great method.
>>
>> I wonder if it would be possible to figure out the LWP numbers
>> (process
>> IDs) of the different threads before the crash occurs, and compare
>> this
>> to the gdb output
>>
>> (gdb) info threads
>>   Id   Target Id         Frame
>> * 1    LWP 1997690       0x00007f59a0109647 in ?? ()
>>   2    LWP 1996840       0x00007f59a0531de7 in ?? ()
>>   3    LWP 1997692       0x00007f59a0109647 in ?? ()
>>   4    LWP 1996857       0x00007f59a020d169 in ?? ()
>>
>> ... to identify which thread crashed, and if it's always the same
>> one.
> 
>>From the LWP numbers, thread 2 and 4 are probably TUR checkers
> (temporary threads). thread 1 can't be easily identified. Could you 
> provide the stack of thread 3? From that, we might be able to infer
> which thread crashed, because multipathd always starts its threads in
> the same sequence.
> 

Here is another core stack(attachment is core dumps):

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `/sbin/multipathd -d -s'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fe17dd97456 in ?? ()
[Current thread is 1 (Thread 0x7fe17cc00700 (LWP 3093458))]
(gdb) bt
#0  0x00007fe17dd97456 in ?? ()
#1  0x0000006800007530 in ?? ()
#2  0x00007fe17cbffb2e in ?? ()
#3  0xffffffff00000053 in ?? ()
#4  0x0000000000002006 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) info thread
  Id   Target Id                           Frame
* 1    Thread 0x7fe17cc00700 (LWP 3093458) 0x00007fe17dd97456 in ?? ()
  2    Thread 0x7fe17d421700 (LWP 3092869) 0x00007fe17da8a929 in __GI___poll (fds=fds at entry=0x0, nfds=nfds at entry=0, timeout=timeout at entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29
  3    Thread 0x7fe17d4a7a80 (LWP 3092860) 0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78
  4    Thread 0x7fe17cc0e700 (LWP 3093459) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
  5    Thread 0x7fe17cbc1700 (LWP 3093460) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
  6    Thread 0x7fe17cb4c700 (LWP 3093461) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
  7    Thread 0x7fe17cb5e700 (LWP 3093462) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
  8    Thread 0x7fe17cb67700 (LWP 3093463) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
  9    Thread 0x7fe17cb70700 (LWP 3093464) 0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
(gdb) thread 2
[Switching to thread 2 (Thread 0x7fe17d421700 (LWP 3092869))]
#0  0x00007fe17da8a929 in __GI___poll (fds=fds at entry=0x0, nfds=nfds at entry=0, timeout=timeout at entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29
29	  return SYSCALL_CANCEL (poll, fds, nfds, timeout);
(gdb) bt
#0  0x00007fe17da8a929 in __GI___poll (fds=fds at entry=0x0, nfds=nfds at entry=0, timeout=timeout at entry=10) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fe17dcdcd91 in poll (__timeout=10, __nfds=0, __fds=0x0) at /usr/include/bits/poll2.h:46
#2  call_rcu_thread (arg=0x557ab760e210) at urcu-call-rcu-impl.h:383
#3  0x00007fe17dcbef4b in start_thread (arg=0x7fe17d421700) at pthread_create.c:486
#4  0x00007fe17da957ef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) thread 3
[Switching to thread 3 (Thread 0x7fe17d4a7a80 (LWP 3092860))]
#0  0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007fe17da96e27 in socket () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fe17db95f34 in sd_pid_notify_with_fds (pid=0, unset_environment=0, state=0x557ab58af801 "ERRNO=0", fds=0x0, n_fds=0) at ../src/libsystemd/sd-daemon/sd-daemon.c:481
#2  0x0000557ab58a6ef7 in child (param=<optimized out>) at main.c:3140
#3  0x0000557ab589f503 in main (argc=<optimized out>, argv=0x7fffe51c3c08) at main.c:3325
(gdb) thread 4
[Switching to thread 4 (Thread 0x7fe17cc0e700 (LWP 3093459))]
#0  0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fe17dd97456 in ?? ()
#2  0x0000006900007530 in ?? ()
#3  0x00007fe17cc0db2e in ?? ()
#4  0xffffffff00000053 in ?? ()
#5  0x0000000000002006 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fe17cbc1700 (LWP 3093460))]
#0  0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
78	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007fe17da8c507 in ioctl () at ../sysdeps/unix/syscall-template.S:78
#1  0x00007fe17dd97456 in ?? ()
#2  0x0000006a00007530 in ?? ()
#3  0x00007fe17cbc0b2e in ?? ()
#4  0xffffffff00000053 in ?? ()
#5  0x0000000000002006 in ?? ()
#6  0x0000000000000000 in ?? ()

Regards
Lixiaokeng
-------------- next part --------------
A non-text attachment was scrubbed...
Name: core.multipathd.0.5912fc2ca07945d8ab7d921ca6ff28ab.3092860.1612673516000000000000.lz4
Type: application/octet-stream
Size: 1533834 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20210207/efcf10be/attachment.obj>


More information about the dm-devel mailing list