[Freeipa-users] DirSrv hanging
Adam Bishop
Adam.Bishop at jisc.ac.uk
Sat Jan 7 05:19:42 UTC 2017
I have a standalone FreeIPA instance that is becoming unresponsive every few hours. While in this state it will accept connections, but will not do anything with them (i.e. if you connect an ldaps client to 636, you see SYN->SYNACK->ACK->ClientHello, but a ServerHello is not returned). This system is running FreeIPA 4.4.0 currently, but this also occurred on 4.2.x. Time is synchronised correctly and this is a fairly new installation so all the PKI expiry dates are well into the future.
It handles queries without complaint, right up until the point it doesn't.
Inspecting the process with strace shows it waiting on a socket:
getpeername(7, 0x7ffeb749af70, [112]) = -1 ENOTCONN (Transport endpoint is not connected)
poll([{fd=50, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN},
{fd=66, events=POLLIN}, {fd=80, events=POLLIN}, {fd=79, events=POLLIN}, {fd=78, events=POLLIN},
{fd=77, events=POLLIN}, {fd=76, events=POLLIN}, {fd=75, events=POLLIN}, {fd=73, events=POLLIN},
{fd=71, events=POLLIN}, {fd=70, events=POLLIN}, {fd=68, events=POLLIN}], 15, 250) = 0 (Timeout)
fd 7 is a constant:
ls -l /proc/2428/fd
lrwx------. 1 root root 64 Jan 6 17:16 7 -> socket:[18972]
I'm not sure if I'm understanding the meaning of the fd entry correctly, but I believe this is the entry:
[root at ldap-001 log]# lsof -p 2428 | grep 18972
ns-slapd 2428 dirsrv 7u IPv6 18972 0t0 TCP *:ldaps (LISTEN)
A backtrace from GDB follows at the end of this message - it shows the address struct, which just contains the source address of the last connection to port 636 before DirSrv hangs.
The server is configured to use the FreeIPA dns service as its own resolver. The DNS service is definitely still running, and resolves the query fine when executed with dig.
There is nothing in the DirSrv logs that indicates an issue. The KDC logs indicate a problem, but I i don't know if DirSrv is hanging because of the KDC, or if the KDC is just reflecting that DirSrv is unresponsive.
Jan 06 21:53:29 ldap-001.domain krb5kdc[2702](info): AS_REQ (6 etypes {18 17 16 23 25 26}) 193.63.63.108: LOOKING_UP_CLIENT: host/ldap-001.domain at DOMAIN for krbtgt/DOMAIN at DOMAIN, Server error
Jan 06 21:53:29 ldap-001.domain krb5kdc[2702](info): closing down fd 12
sssd reports an issue too, but that is almost certainly due to an unresponsive DirSrv:
(Sat Jan 7 03:16:08 2017) [sssd[nss]] [sss_dp_get_reply] (0x0010): The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]
I'm not really sure what to check next - all the individual components seem to be working, but not together.
Any suggestions are appreciated.
Regards,
Adam Bishop
gpg: E75B 1F92 6407 DFDF 9F1C BF10 C993 2504 6609 D460
jisc.ac.uk
---
[root at ldap-001 log]# gdb -p 2428
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 2428
0x00007fc80bf4fdfd in poll () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
Missing separate debuginfos, use: debuginfo-install ipa-server-4.4.0-14.el7.centos.1.1.x86_64
(gdb) break getpeername
Breakpoint 1 at 0x7fc80bf5b4b0: file ../sysdeps/unix/syscall-template.S, line 81.
(gdb) cont
Continuing.
Breakpoint 1, getpeername () at ../sysdeps/unix/syscall-template.S:81
81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt full
#0 getpeername () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1 0x00007fc80c888389 in pt_GetPeerName (fd=0x7fc810d92010, addr=0x7ffeb749af70) at ../../../nspr/pr/src/pthreads/ptio.c:2795
rv = -1
addr_len = 112
#2 0x00007fc80d3fec23 in ssl_Poll (fd=0x7fc810b69260, how_flags=<optimized out>, p_out_flags=0x7ffeb749b06c) at sslsock.c:2639
ss = 0x7fc810d94f30
new_flags = 1
addr = {raw = {family = 0, data = '\000' <repeats 13 times>}, inet = {family = 0, port = 0, ip = 0, pad = "\000\000\000\000\000\000\000"}, ipv6 = {family = 0, port = 0, flowinfo = 0,
ip = {_S6_un = {_S6_u8 = '\000' <repeats 15 times>, _S6_u16 = {0, 0, 0, 0, 0, 0, 0, 0}, _S6_u32 = {0, 0, 0, 0}, _S6_u64 = {0, 0}}}, scope_id = 0}, local = {family = 0,
path = '\000' <repeats 30 times>, "\061\071\063.63.63.108\000\000\000`\327!\f\310\177\000\000\017\000\000\000\000\000\000\000p\260I\267\376\177\000\000\000\000\000\000\000\000\000\000\372", '\000' <repeats 15 times>, "\372\000\000\000\000\000\000\000\215", <incomplete sequence \343>}}
#3 0x00007fc80c887a45 in _pr_poll_with_poll (pds=0x7fc811256b40, npds=15, timeout=timeout at entry=250) at ../../../nspr/pr/src/pthreads/ptio.c:3812
in_flags_read = 0
in_flags_write = 0
out_flags_read = 0
out_flags_write = 0
stack_syspoll = {{fd = 50, events = 1, revents = 0}, {fd = 6, events = 1, revents = 0}, {fd = 7, events = 1, revents = 0}, {fd = 8, events = 1, revents = 0}, {fd = 66, events = 1,
revents = 0}, {fd = 80, events = 1, revents = 0}, {fd = 79, events = 1, revents = 0}, {fd = 78, events = 1, revents = 0}, {fd = 77, events = 1, revents = 0}, {fd = 76, events = 1,
revents = 0}, {fd = 75, events = 1, revents = 0}, {fd = 73, events = 1, revents = 0}, {fd = 71, events = 1, revents = 0}, {fd = 70, events = 1, revents = 0}, {fd = 68, events = 1,
revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 1219907217, events = -32767, revents = -1}, {fd = 2, events = 32766, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0,
events = 0, revents = 0}, {fd = 48, events = 91, revents = 0}, {fd = -1219907216, events = 32766, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {
fd = 110, events = 119, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = -1219907217, events = 32766, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = -1219907201,
events = 32766, revents = 0}, {fd = 203544416, events = 32712, revents = 0}, {fd = 124, events = 0, revents = 0}, {fd = 2560, events = 0, revents = 0}, {fd = 1219907089,
events = -32767, revents = -1}, {fd = 3, events = 32712, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 48, events = 91, revents = 0}, {
fd = -1219907088, events = 32766, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 110, events = 119, revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = -1219907089, events = 32766, revents = 0}, {fd = 210264088, events = 32712, revents = 0}, {fd = 1, events = 0, revents = 0}, {fd = 287047696, events = 32712,
revents = 0}, {fd = -1, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0,
revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 287320512, events = 32712, revents = 0}, {fd = 210265391,
events = 32712, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 281542400, events = 32712, revents = 0}, {fd = 287320512, events = 32712, revents = 0}, {fd = -133551240,
events = 32711, revents = 0}, {fd = 0, events = 0, revents = 0}, {fd = 246979857, events = 32712, revents = 0}, {fd = 5, events = 15, revents = 0}, {fd = -1219906728, events = 32766,
revents = 0}}
syspoll = 0x7ffeb749b070
index = 2
msecs = <optimized out>
ready = 0
start = <optimized out>
elapsed = <optimized out>
remaining = <optimized out>
#4 0x00007fc80c88a655 in PR_Poll (pds=<optimized out>, npds=<optimized out>, timeout=timeout at entry=250) at ../../../nspr/pr/src/pthreads/ptio.c:4324
No locals.
#5 0x00007fc80eb8d789 in slapd_daemon (ports=ports at entry=0x7ffeb749b630) at ldap/servers/slapd/daemon.c:1242
select_return = 0
prerr = <optimized out>
n_tcps = 0x7fc810b6db30
s_tcps = 0x7fc810b6da30
i_unix = 0x7fc810b6da10
fdesp = 0x0
num_poll = 15
pr_timeout = 250
time_thread_p = 0x7fc8111ff350
threads = <optimized out>
in_referral_mode = 0
tp = 0x0
tp_config = {init_flag = 1219906497, initial_threads = -32767, max_threads = 9, stacksize = 0, event_queue_size = 2, work_queue_size = 0, log_fct = 0x0,
log_start_fct = 0xffff800148b64ba1, log_close_fct = 0x7ffe0000000a, malloc_fct = 0x2, calloc_fct = 0x0, realloc_fct = 0x5b00000032, free_fct = 0x7ffeb749b460}
#6 0x00007fc80eb7f253 in main (argc=5, argv=0x7ffeb749bc68) at ldap/servers/slapd/main.c:1143
return_value = 0
slapdFrontendConfig = <optimized out>
ports_info = {n_port = 389, s_port = 636, n_listenaddr = 0x7fc810b6dc40, s_listenaddr = 0x7fc810b6dba0, n_socket = 0x7fc810b6db30, i_listenaddr = 0x7fc810b6db50, i_port = 1,
i_socket = 0x7fc810b6da10, s_socket = 0x7fc810b6da30}
m = <optimized out>
notify = <optimized out>
Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
More information about the Freeipa-users
mailing list