[Freeipa-users] SLAPD stops answering

Troels Hansen th at casalogic.dk
Mon Jan 9 16:16:24 UTC 2017


Yes, packages installed and will generate dump next time it happens. 

Yes, the failing RetroCL message is the same changenumber. Repeated about 2-4 times/sec. 

----- On Jan 9, 2017, at 2:54 PM, Ludwig Krispenz <lkrispen at redhat.com> wrote: 

> Hi,

> there seem to be to issues here, maybe related: a hanging slapd process and the
> retro CL errors.

> If the slapd process is not responding can we get a pstack or gdb backtrace (
> http://www.port389.org/docs/389ds/FAQ/faq.html#debug_crashes ) of the process ?
> About the Retro CL messages, is it always the same changenumber which is
> reported ?

> On 01/09/2017 02:06 PM, Troels Hansen wrote:

>> Hi, we have a IPA installation, which obviously needs upgrading.
>> Its a single server running RHEL7.1 running IPA 4.1

>> However, it have been running smooth untill now:

>> Rebooting makes everything running again, but only for a few days.

>> It looks like everything fails around 0:17:47 and comes up again just before 8,
>> when the server is rebooted.

>> Jan 6 00:19:46 fbbidm01 winbindd[2965]: failed to bind to server
>> ldapi://%2fvar%2frun%2fslapd-DOMAIN.LAN.socket with dn="[Anonymous bind]"
>> Error: Local error
>> Jan 6 00:19:46 fbbidm01 winbindd[2965]: (unknown)
>> Jan 6 00:20:29 fbbidm01 winbindd[2965]: [2017/01/06 00:20:29.758332, 0]
>> ipa_sam.c:4128(bind_callback_cleanup)

>> Looking at the SLAPD logs also reveals it stopped answering:

>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=62 SRCH
>> base="cn=radius_aura_admin,cn=groups,cn=accounts,dc=domain,dc=lan" scope=0
>> filter="(objectClass=*)" attrs="cn"
>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=62 RESULT err=0 tag=101 nentries=1
>> etime=0
>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=63 SRCH
>> base="cn=radius_users,cn=groups,cn=accounts,dc=domain,dc=lan" scope=0
>> filter="(objectClass=*)" attrs="cn"
>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=63 RESULT err=0 tag=101 nentries=1
>> etime=0
>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=64 SRCH
>> base="cn=system_radius_users,cn=groups,cn=accounts,dc=domain,dc=lan" scope=0
>> filter="(objectClass=*)" attrs="cn"
>> [06/Jan/2017:00:17:47 +0100] conn=40702 op=64 RESULT err=0 tag=101 nentries=1
>> etime=0
>> [06/Jan/2017:00:17:48 +0100] conn=40702 op=65 SRCH
>> base="cn=accounts,dc=domain,dc=lan" scope=2 filter="(uid=sys_prov_aura)"
>> attrs=ALL
>> [06/Jan/2017:00:17:48 +0100] conn=40702 op=65 RESULT err=0 tag=101 nentries=1
>> etime=0
>> [06/Jan/2017:00:17:48 +0100] conn=40702 op=66 BIND
>> dn="uid=sys_prov_aura,cn=users,cn=accounts,dc=domain,dc=lan" method=128
>> version=3
>> [06/Jan/2017:00:17:48 +0100] conn=40702 op=66 RESULT err=0 tag=97 nentries=0
>> etime=0 dn="uid=sys_prov_aura,cn=users,cn=accounts,dc=domain,dc=lan"
>> [06/Jan/2017:00:17:51 +0100] conn=40703 fd=158 slot=158 connection from
>> 10.250.8.66 to 10.250.8.58
>> [06/Jan/2017:00:17:53 +0100] conn=40704 fd=159 slot=159 SSL connection from
>> 10.250.8.37 to 10.250.8.58
>> [06/Jan/2017:00:18:02 +0100] conn=40705 fd=160 slot=160 SSL connection from
>> 10.250.8.57 to 10.250.8.58
>> [06/Jan/2017:00:18:02 +0100] conn=40706 fd=161 slot=161 SSL connection from
>> 10.250.20.102 to 10.250.8.58
>> [06/Jan/2017:00:18:03 +0100] conn=40707 fd=162 slot=162 SSL connection from
>> 10.250.20.102 to 10.250.8.58
>> [06/Jan/2017:00:18:58 +0100] conn=40708 fd=163 slot=163 connection from
>> 10.250.8.66 to 10.250.8.58
>> [06/Jan/2017:00:19:03 +0100] conn=40709 fd=164 slot=164 connection from local to
>> /var/run/slapd-DOMAIN.LAN.socket
>> [06/Jan/2017:00:19:35 +0100] conn=40710 fd=165 slot=165 connection from
>> 10.250.8.58 to 10.250.8.58
>> [06/Jan/2017:00:19:35 +0100] conn=40711 fd=166 slot=166 connection from
>> 10.150.27.7 to 10.250.8.58
>> [06/Jan/2017:00:19:43 +0100] conn=40712 fd=167 slot=167 SSL connection from
>> 10.250.20.102 to 10.250.8.58
>> [06/Jan/2017:00:19:46 +0100] conn=40713 fd=168 slot=168 connection from local to
>> /var/run/slapd-DOMAIN.LAN.socket

>> It looks like it just stops answering at 00:17:48

>> The slapd error log reveals nothing:

>> [06/Jan/2017:00:17:34 +0100] DSRetroclPlugin - replog: an error occured while
>> adding change number 3875312, dn = changenumber=3875312,cn=changelog: Already
>> exists.
>> [06/Jan/2017:00:17:34 +0100] retrocl-plugin - retrocl_postob: operation failure
>> [68]
>> [06/Jan/2017:07:57:21 +0100] - slapd shutting down - signaling operation threads
>> - op stack size 0 max work q size 735 max work q stack size 23
>> [06/Jan/2017:07:57:21 +0100] - slapd shutting down - waiting for 30 threads to
>> terminate
>> [06/Jan/2017:07:58:02 +0100] SSL Initialization - Configured SSL version range:
>> min: TLS1.0, max: TLS1.2

>> However, see a gazillion of these lines in the error log:

>> DSRetroclPlugin - replog: an error occured while adding change number 3875312,
>> dn = changenumber=3875312,cn=changelog: Already exists.

>> Anyone with some thoughts about this, other that "Just upgrade".

>> --

>> Med venlig hilsen

>> Troels Hansen

>> Systemkonsulent

>> Casalogic A/S

>> T (+45) 70 20 10 63

>> M (+45) 22 43 71 57

>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og
>> meget mere.

> --
> Red Hat GmbH, http://www.de.redhat.com/ , Registered seat: Grasbrunn,
> Commercial register: Amtsgericht Muenchen, HRB 153243,
> Managing Directors: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric
> Shander

> --
> Manage your subscription for the Freeipa-users mailing list:
> https://www.redhat.com/mailman/listinfo/freeipa-users
> Go to http://freeipa.org for more info on the project

-- 

Med venlig hilsen 

Troels Hansen 

Systemkonsulent 

Casalogic A/S 

T (+45) 70 20 10 63 

M (+45) 22 43 71 57 

Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og meget mere. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20170109/d1777325/attachment.htm>


More information about the Freeipa-users mailing list