[Freeipa-users] ns-slapd hang/segfault

Simo Sorce simo at redhat.com
Wed Dec 21 21:43:02 UTC 2011


On Wed, 2011-12-21 at 15:33 -0500, Dan Scott wrote:
> On Wed, Dec 21, 2011 at 14:10, Dan Scott <danieljamesscott at gmail.com> wrote:
> > On Mon, Dec 19, 2011 at 15:26, Dan Scott <danieljamesscott at gmail.com> wrote:
> >> On Mon, Dec 19, 2011 at 14:14, Simo Sorce <simo at redhat.com> wrote:
> >>> On Mon, 2011-12-19 at 11:01 -0500, Dan Scott wrote:
> >>>> On Thu, Dec 15, 2011 at 11:51, Rich Megginson <rmeggins at redhat.com> wrote:
> >>>> > On 12/15/2011 09:48 AM, Dan Scott wrote:
> >>>> >>
> >>>> >> Hi,
> >>>> >>
> >>>> >> On Thu, Dec 15, 2011 at 10:58, Rich Megginson<rmeggins at redhat.com>  wrote:
> >>>> >>>
> >>>> >>> On 12/15/2011 08:41 AM, Dan Scott wrote:
> >>>> >>>>
> >>>> >>>> Hi,
> >>>> >>>>
> >>>> >>>> On my Fedora 15 FreeIPA server, I'm having some problems with
> >>>> >>>> stability. The server appears to 'hang' and stops responding to LDAP
> >>>> >>>> lookups. When I restart the dirsrv service, I get:
> >>>> >>>>
> >>>> >>>> Dec 15 09:40:02 ohm kernel: [254566.011404] ns-slapd[28910]: segfault
> >>>> >>>> at 17d ip 00007f00dbc0208c sp 00007fff929b7848 error 4 in
> >>>> >>>> libc-2.14.so[7f00dbb87000+18f000]
> >>>> >>>>
> >>>> >>>> and the /var/log/dirsrv/slapd-EXAMPLE-COM/errors contains
> >>>> >>>>
> >>>> >>>> [15/Dec/2011:09:47:35 -0500] set_krb5_creds - Could not get initial
> >>>> >>>> credentials for principal [ldap/example.com at EXAMPLE.COM] in keytab
> >>>> >>>> [WRFILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC
> >>>> >>>> for requested realm)
> >>>> >>>> [15/Dec/2011:09:47:35 -0500] slapd_ldap_sasl_interactive_bind - Error:
> >>>> >>>> could not perform interactive bind for id [] mech [GSSAPI]: error -2
> >>>> >>>> (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified
> >>>> >>>> GSS failure.  Minor code may provide more information (Credentials
> >>>> >>>> cache file '/tmp/krb5cc_496' not found))
> >>>> >>>>
> >>>> >>>> This is happening very frequently, I'm having to restart the dirsrv
> >>>> >>>> process once an hour, otherwise people start complaining.
> >>>> >>>>
> >>>> >>>> I experienced similar problems with FreeIPA 1, when I was using Fedora
> >>>> >>>> 14 and earlier, and had to regularly (also once per hour) restart the
> >>>> >>>> dirsrv process. Could this be related?
> >>>> >>>>
> >>>> >>>> I also noticed this:
> >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=730387
> >>>> >>>>
> >>>> >>>> There are updates in 'updates-testing' which I believe fix the above
> >>>> >>>> issue, but I'm reluctant to install from a testing repo on my
> >>>> >>>> production server, can anyone report any feedback on this?
> >>>> >>>
> >>>> >>> The above bug does not cause a segfault.
> >>>> >>> What version of 389-ds-base are you using?
> >>>> >>
> >>>> >> [root at ohm ~]# rpm -qa|grep 389
> >>>> >> 389-ds-base-libs-1.2.10-0.4.a4.fc15.x86_64
> >>>> >> 389-ds-base-1.2.10-0.4.a4.fc15.x86_64
> >>>> >> [root at ohm ~]#
> >>>> >
> >>>> > a4 is alpha software.  Not sure how that got released to stable.
> >>>> >
> >>>> >>> Please enable the collection of core dumps so we can debug the crash -
> >>>> >>> see
> >>>> >>> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
> >>>> >>
> >>>> >> OK. I think there is a small typo in the instructions:
> >>>> >>
> >>>> >> 'debuginfo-install 389-ds-base-debuginfo' should be 'debuginfo-install
> >>>> >> 389-ds-base'
> >>>> >
> >>>> > Thanks.  Fixed.
> >>>> >
> >>>> >> I managed to get the core dump (attached - so I only sent this message
> >>>> >> to you, not the list as well), but it doesn't contain much
> >>>> >> information.
> >>>> >
> >>>> > This is https://bugzilla.redhat.com/show_bug.cgi?id=755725
> >>>> >
> >>>> > Will be fixed in 1.2.10.a6
> >>>> >
> >>>> > But this still doesn't explain your kerberos errors.
> >>>>
> >>>> An additional problem is also occurring. I've been finding that the:
> >>>>
> >>>> /etc/dirsrv/slapd-EXAMPLE-COM/dse.ldif
> >>>>
> >>>> file is empty and prevents dirsrv from starting. I can restore it from
> >>>> dse.ldif.bak or dse.ldif.startOK, but this may be related to the LDAP
> >>>> problems that I'm having?
> >>>
> >>> This is an upgrade time problem, it should be fixed in latest packages.
> >>> Did you recently upgrade freeipa packages if so from what version to
> >>> what version ?
> >>
> >> The 0 length file doesn't appear related to upgrades. Possibly it only
> >> happens on the first service restart after an upgrade?
> >>
> >> It's happened at least 4 times since the last freeipa package upgrade
> >> on 4th November, so it seems to be happening too regularly to be the
> >> result of an upgrade.
> >>
> >> [root at curie ~]# grep freeipa /var/log/yum.log
> >> Sep 06 16:56:51 Installed: freeipa-python-2.0.1-2.fc15.x86_64
> >> Sep 06 17:00:13 Installed: freeipa-client-2.0.1-2.fc15.x86_64
> >> Sep 06 17:00:14 Installed: freeipa-admintools-2.0.1-2.fc15.x86_64
> >> Sep 06 17:01:52 Installed: freeipa-server-selinux-2.0.1-2.fc15.x86_64
> >> Sep 06 17:01:56 Installed: freeipa-server-2.0.1-2.fc15.x86_64
> >> Sep 08 11:23:35 Updated: freeipa-python-2.1.0-1.fc15.x86_64
> >> Sep 08 11:23:41 Updated: freeipa-client-2.1.0-1.fc15.x86_64
> >> Sep 08 11:23:41 Updated: freeipa-admintools-2.1.0-1.fc15.x86_64
> >> Sep 08 11:25:00 Updated: freeipa-server-selinux-2.1.0-1.fc15.x86_64
> >> Sep 08 11:26:06 Updated: freeipa-server-2.1.0-1.fc15.x86_64
> >> Nov 04 15:46:43 Updated: freeipa-python-2.1.3-2.fc15.x86_64
> >> Nov 04 15:52:48 Updated: freeipa-client-2.1.3-2.fc15.x86_64
> >> Nov 04 15:52:48 Updated: freeipa-admintools-2.1.3-2.fc15.x86_64
> >> Nov 04 15:54:47 Updated: freeipa-server-2.1.3-2.fc15.x86_64
> >> Nov 04 15:56:02 Updated: freeipa-server-selinux-2.1.3-2.fc15.x86_64
> >>
> >> Dan
> >
> > I'm still having fairly serious problems. I keep getting:
> >
> > ipa: ERROR: Kerberos error: Kerberos error: ('Unspecified GSS failure.
> >  Minor code may provide more information', 851968)/('Cannot contact
> > any KDC for requested realm', -1765328228)/
> >
> > Whenever I try and run IPA commands on either of my servers, or a
> > client with the admin tools installed.
> >
> > The server logs contain:
> >
> > slapd_ldap_sasl_interactive_bind - Error: could not perform
> > interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP
> > server) ((null))
> > slapi_ldap_bind - Error: could not perform interactive bind for id []
> > mech [GSSAPI]: error -1 (Can't contact LDAP server)
> >
> > And I can't create new replicas because they fail with:
> >
> > 2011-12-21 11:25:58,356 DEBUG Failed to start replication
> >  File "/usr/sbin/ipa-replica-install", line 484, in <module>
> >    main()
> >
> >  File "/usr/sbin/ipa-replica-install", line 435, in main
> >    ds = install_replica_ds(config)
> >
> >  File "/usr/sbin/ipa-replica-install", line 137, in install_replica_ds
> >    pkcs12_info)
> >
> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py",
> > line 284, in create_replica
> >    self.start_creation("Configuring directory server", 60)
> >
> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py",
> > line 248, in start_creation
> >    method()
> >
> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py",
> > line 297, in __setup_replica
> >    r_bindpw=self.dm_password)
> >
> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
> > line 694, in setup_replication
> >    raise RuntimeError("Failed to start replication")
> >
> > Can someone help me? This is getting fairly serious because I can't
> > create/modify anything and I'm worried that there will be problems
> > with existing users soon as well.
> 
> OK, I think I'm narrowing in on this. It looks like the replication
> agreement is broken and the servers have got out of sync:

odd

> On the 'master' server (which contains the PKI dirsrv process):

The PKI instance uses a diffeent set of replication agreementsso you
can't see those agreements with ipa-replica-manage which handles only
the IPA Idm instance.

> [root at fileserver1 ~]# ipa-replica-manage list
> fileserver1.example.com: master
> 
> On the other server:
> 
> [root at fileserver2 ~]# ipa-replica-manage list
> fileserver1.example.com: master
> fileserver2.example.com: master

strange indeed.

> When I try and add the missing replication:
> 
> [root at fileserver1 ~]# ipa-replica-manage connect fileserver2.example.com
> unexpected error: list index out of range
> 
> Do I need to delete the replication from fileserver2?

You can't remove a replication agreement if it is the only agreement you
have. This is to avoid split-brain situations.

Not sure how to handle a disappeared agreement though it's
theorethically not possible unless you 'inadvertently' ran
ipa-replica-manage --force del fileserver2 on fileserver1 ...

Can you look into cn=config and see if you have references toi
fileserver2 ?
Maybe it is just a bug in displaying actually active replicas.

> As an aside, there are some errors in the documentation for
> ipa-replica-manage. Some of the examples have 'ipa replica-manage'
> instead of 'ipa-replica-manage' (space instead of '-').

Thanks will file a doc bug.

Simo.

-- 
Simo Sorce * Red Hat, Inc * New York




More information about the Freeipa-users mailing list