[Freeipa-users] ns-slapd hangs for 2-3 minutes, then resumes.

Guillermo Fuentes guillermo.fuentes at modernizingmedicine.com
Wed Jun 1 16:37:10 UTC 2016


Hi all,

We are experiencing a similar issue like the one discussed in the
following thread but we are running FreeIPA 4.2 on CentOS 7.2:
https://www.redhat.com/archives/freeipa-users/2015-February/msg00205.html

LDAP service stops responding to queries (hangs). LDAP connections on
the server climb sometimes up to 10 times the normal amount and load
goes to 0. Then, the connections start to drop until they get to a
normal level and the LDAP service starts to respond to queries again.
This happens in between 3-5 minutes:

Time,LDAP conn, Opened files(ns-slapd), File
Desc(ns-slapd),Threads(ns-slapd),Load1,Load5,Load15
8:54:03,101,353,216,142,0.43,0.20,0.16
8:55:02,108,359,221,142,0.19,0.18,0.15
8:56:03,110,361,224,142,0.07,0.15,0.14
8:57:14,117,383,246,142,0.15,0.16,0.15
8:58:04,276,371,234,142,0.05,0.13,0.14
8:59:05,469,371,234,142,0.02,0.11,0.13
9:00:08,719,371,234,142,0.01,0.09,0.12
9:01:18,1060,371,234,142,0.00,0.07,0.12
9:02:10,742,371,233,142,0.10,0.09,0.12
9:03:06,365,372,235,142,0.13,0.10,0.13
9:04:04,262,379,242,142,0.87,0.29,0.19
9:05:02,129,371,233,142,0.51,0.31,0.20
9:06:03,126,377,240,142,0.42,0.33,0.22
9:07:03,125,377,238,142,0.17,0.27,0.21

Nothing is logged in the errors log file of the server having the
problem (ipa1 as an example).
In the replicas this is logged:
8:59:05 -0400] NSMMReplicationPlugin - agmt="cn=meToipa1.example.com"
(ipa1:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.
9:01:05 -0400] NSMMReplicationPlugin - agmt="cn=meToipa1.example.com"
(ipa1:389): Unable to receive the response for a startReplication
extended operation to consumer (Timed out). Will retry later.

Nothing is logged in the access log file until after ns-slapd starts
responding again:
...
8:57:00 -0400] conn=12384 fd=234 slot=234 connection from 172.20.0.1
to 172.20.2.45
8:57:00 -0400] conn=12385 fd=235 slot=235 connection from 172.20.0.1
to 172.20.2.45
8:57:00 -0400] conn=12386 fd=236 slot=236 connection from 172.20.0.1
to 172.20.2.45
8:57:00 -0400] conn=12387 fd=237 slot=237 connection from 172.20.0.1
to 172.20.2.45
8:57:00 -0400] conn=10384 op=1227 EXT oid="2.16.840.1.113730.3.5.12"
name="replication-multimaster-extop"
8:57:00 -0400] conn=12324 op=8 RESULT err=0 tag=101 nentries=1 etime=0
8:57:00 -0400] conn=8838 op=2545 EXT oid="2.16.840.1.113730.3.5.12"
name="replication-multimaster-extop"
8:57:00 -0400] conn=8838 op=2545 RESULT err=0 tag=120 nentries=0 etime=0
8:57:00 -0400] conn=10384 op=1227 RESULT err=0 tag=120 nentries=0 etime=0
8:57:00 -0400] conn=12382 op=-1 fd=170 closed - B1
8:57:00 -0400] conn=12383 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
8:57:00 -0400] conn=12384 op=-1 fd=234 closed - B1
8:57:00 -0400] conn=12385 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
8:57:00 -0400] conn=12383 op=0 RESULT err=0 tag=101 nentries=1 etime=0
8:57:00 -0400] conn=12386 op=-1 fd=236 closed - B1
8:57:00 -0400] conn=12385 op=0 RESULT err=0 tag=101 nentries=1 etime=0
8:57:00 -0400] conn=12387 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
8:57:00 -0400] conn=12387 op=0 RESULT err=0 tag=101 nentries=1 etime=0
8:57:00 -0400] conn=12385 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
8:57:00 -0400] conn=12387 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
8:57:00 -0400] conn=12385 op=1 RESULT err=14 tag=97 nentries=0
etime=0, SASL bind in progress
8:57:00 -0400] conn=12383 op=1 BIND dn="" method=sasl version=3 mech=GSSAPI
8:57:00 -0400] conn=10384 op=1228 EXT oid="2.16.840.1.113730.3.5.5"
name="Netscape Replication End Session"
8:57:00 -0400] conn=10384 op=1228 RESULT err=0 tag=120 nentries=0 etime=0
8:57:00 -0400] conn=12383 op=1 RESULT err=14 tag=97 nentries=0
etime=0, SASL bind in progress
9:02:00 -0400] conn=12388 fd=170 slot=170 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12389 fd=234 slot=234 SSL connection from
172.20.0.24 to 172.20.2.45
9:02:00 -0400] conn=12390 fd=236 slot=236 connection from local to
/var/run/slapd-EXAMPLE-COM.socket
9:02:00 -0400] conn=12391 fd=238 slot=238 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12392 fd=239 slot=239 SSL connection from
172.20.0.24 to 172.20.2.45
9:02:00 -0400] conn=12393 fd=240 slot=240 connection from local to
/var/run/slapd-EXAMPLE-COM.socket
9:02:00 -0400] conn=12394 fd=241 slot=241 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12395 fd=242 slot=242 SSL connection from
172.20.0.24 to 172.20.2.45
9:02:00 -0400] conn=12396 fd=243 slot=243 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12397 fd=244 slot=244 SSL connection from
172.20.0.24 to 172.20.2.45
9:02:00 -0400] conn=12398 fd=245 slot=245 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12400 fd=247 slot=247 connection from 172.20.0.1
to 172.20.2.45
9:02:00 -0400] conn=12401 fd=248 slot=248 connection from 172.20.0.1
to 172.20.2.45
...
9:02:00 -0400] conn=12390 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
9:02:00 -0400] conn=12388 op=-1 fd=170 closed - B1
9:02:00 -0400] conn=12393 op=0 BIND dn="" method=sasl version=3 mech=GSSAPI
9:02:00 -0400] conn=12391 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
9:02:00 -0400] conn=12394 op=-1 fd=241 closed - B1
9:02:00 -0400] conn=12391 op=0 RESULT err=0 tag=101 nentries=1 etime=0
9:02:00 -0400] conn=12396 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
9:02:00 -0400] conn=12396 op=0 RESULT err=0 tag=101 nentries=1 etime=0
9:02:00 -0400] conn=12398 op=-1 fd=245 closed - B1
9:02:00 -0400] conn=12400 op=0 SRCH base="" scope=0
filter="(objectClass=*)" attrs="supportedSASLMechanisms
defaultnamingcontext namingContexts schemanamingcontext saslrealm"
9:02:00 -0400] conn=12400 op=0 RESULT err=0 tag=101 nentries=1 etime=0
9:02:00 -0400] conn=12401 op=-1 fd=248 closed - B1
9:02:00 -0400] conn=12391 op=1 ABANDON targetop=NOTFOUND msgid=1
9:02:00 -0400] conn=12396 op=1 ABANDON targetop=NOTFOUND msgid=1
9:02:00 -0400] conn=12400 op=1 ABANDON targetop=NOTFOUND msgid=1
9:02:00 -0400] conn=12391 op=2 UNBIND
9:02:00 -0400] conn=12396 op=2 UNBIND
9:02:00 -0400] conn=12391 op=2 fd=238 closed - U1
9:02:00 -0400] conn=12396 op=2 fd=243 closed - U1
9:02:00 -0400] conn=12400 op=2 UNBIND
9:02:00 -0400] conn=12400 op=2 fd=247 closed - U1
...


Environment:
# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

# rpm -qa ipa*
ipa-server-4.2.0-15.0.1.el7.centos.6.1.x86_64
ipa-python-4.2.0-15.0.1.el7.centos.6.1.x86_64
ipa-admintools-4.2.0-15.0.1.el7.centos.6.1.x86_64
ipa-server-trust-ad-4.2.0-15.0.1.el7.centos.6.1.x86_64
ipa-client-4.2.0-15.0.1.el7.centos.6.1.x86_64
ipa-server-dns-4.2.0-15.0.1.el7.centos.6.1.x86_64

# rpm -qa 389*
389-ds-base-libs-1.3.4.0-30.el7_2.x86_64
389-ds-base-1.3.4.0-30.el7_2.x86_64

We have 4 FreeIPA servers with replication working fine between them.
ipa1 is handling LDAP authentication for +400 clients and has been
tunned as recommended per
https://access.redhat.com/documentation/en-US/Red_Hat_Directory_Server/8.2/html/Performance_Tuning_Guide/system-tuning.html

Is this a known issue?
Any idea what can be causing ns-slapd to hang?

Thanks in advance!

Guillermo




More information about the Freeipa-users mailing list