[Freeipa-users] ns-slapd hang/segfault

Dan Scott danieljamesscott at gmail.com
Wed Dec 21 22:39:26 UTC 2011


On Wed, Dec 21, 2011 at 16:43, Simo Sorce <simo at redhat.com> wrote:
> On Wed, 2011-12-21 at 15:33 -0500, Dan Scott wrote:
>> On Wed, Dec 21, 2011 at 14:10, Dan Scott <danieljamesscott at gmail.com> wrote:
>> > On Mon, Dec 19, 2011 at 15:26, Dan Scott <danieljamesscott at gmail.com> wrote:
>> >> On Mon, Dec 19, 2011 at 14:14, Simo Sorce <simo at redhat.com> wrote:
>> >>> On Mon, 2011-12-19 at 11:01 -0500, Dan Scott wrote:
>> >>>> On Thu, Dec 15, 2011 at 11:51, Rich Megginson <rmeggins at redhat.com> wrote:
>> >>>> > On 12/15/2011 09:48 AM, Dan Scott wrote:
>> >>>> >>
>> >>>> >> Hi,
>> >>>> >>
>> >>>> >> On Thu, Dec 15, 2011 at 10:58, Rich Megginson<rmeggins at redhat.com>  wrote:
>> >>>> >>>
>> >>>> >>> On 12/15/2011 08:41 AM, Dan Scott wrote:
>> >>>> >>>>
>> >>>> >>>> Hi,
>> >>>> >>>>
>> >>>> >>>> On my Fedora 15 FreeIPA server, I'm having some problems with
>> >>>> >>>> stability. The server appears to 'hang' and stops responding to LDAP
>> >>>> >>>> lookups. When I restart the dirsrv service, I get:
>> >>>> >>>>
>> >>>> >>>> Dec 15 09:40:02 ohm kernel: [254566.011404] ns-slapd[28910]: segfault
>> >>>> >>>> at 17d ip 00007f00dbc0208c sp 00007fff929b7848 error 4 in
>> >>>> >>>> libc-2.14.so[7f00dbb87000+18f000]
>> >>>> >>>>
>> >>>> >>>> and the /var/log/dirsrv/slapd-EXAMPLE-COM/errors contains
>> >>>> >>>>
>> >>>> >>>> [15/Dec/2011:09:47:35 -0500] set_krb5_creds - Could not get initial
>> >>>> >>>> credentials for principal [ldap/example.com at EXAMPLE.COM] in keytab
>> >>>> >>>> [WRFILE:/etc/dirsrv/ds.keytab]: -1765328228 (Cannot contact any KDC
>> >>>> >>>> for requested realm)
>> >>>> >>>> [15/Dec/2011:09:47:35 -0500] slapd_ldap_sasl_interactive_bind - Error:
>> >>>> >>>> could not perform interactive bind for id [] mech [GSSAPI]: error -2
>> >>>> >>>> (Local error) (SASL(-1): generic failure: GSSAPI Error: Unspecified
>> >>>> >>>> GSS failure.  Minor code may provide more information (Credentials
>> >>>> >>>> cache file '/tmp/krb5cc_496' not found))
>> >>>> >>>>
>> >>>> >>>> This is happening very frequently, I'm having to restart the dirsrv
>> >>>> >>>> process once an hour, otherwise people start complaining.
>> >>>> >>>>
>> >>>> >>>> I experienced similar problems with FreeIPA 1, when I was using Fedora
>> >>>> >>>> 14 and earlier, and had to regularly (also once per hour) restart the
>> >>>> >>>> dirsrv process. Could this be related?
>> >>>> >>>>
>> >>>> >>>> I also noticed this:
>> >>>> >>>> https://bugzilla.redhat.com/show_bug.cgi?id=730387
>> >>>> >>>>
>> >>>> >>>> There are updates in 'updates-testing' which I believe fix the above
>> >>>> >>>> issue, but I'm reluctant to install from a testing repo on my
>> >>>> >>>> production server, can anyone report any feedback on this?
>> >>>> >>>
>> >>>> >>> The above bug does not cause a segfault.
>> >>>> >>> What version of 389-ds-base are you using?
>> >>>> >>
>> >>>> >> [root at ohm ~]# rpm -qa|grep 389
>> >>>> >> 389-ds-base-libs-1.2.10-0.4.a4.fc15.x86_64
>> >>>> >> 389-ds-base-1.2.10-0.4.a4.fc15.x86_64
>> >>>> >> [root at ohm ~]#
>> >>>> >
>> >>>> > a4 is alpha software.  Not sure how that got released to stable.
>> >>>> >
>> >>>> >>> Please enable the collection of core dumps so we can debug the crash -
>> >>>> >>> see
>> >>>> >>> http://directory.fedoraproject.org/wiki/FAQ#Debugging_Crashes
>> >>>> >>
>> >>>> >> OK. I think there is a small typo in the instructions:
>> >>>> >>
>> >>>> >> 'debuginfo-install 389-ds-base-debuginfo' should be 'debuginfo-install
>> >>>> >> 389-ds-base'
>> >>>> >
>> >>>> > Thanks.  Fixed.
>> >>>> >
>> >>>> >> I managed to get the core dump (attached - so I only sent this message
>> >>>> >> to you, not the list as well), but it doesn't contain much
>> >>>> >> information.
>> >>>> >
>> >>>> > This is https://bugzilla.redhat.com/show_bug.cgi?id=755725
>> >>>> >
>> >>>> > Will be fixed in 1.2.10.a6
>> >>>> >
>> >>>> > But this still doesn't explain your kerberos errors.
>> >>>>
>> >>>> An additional problem is also occurring. I've been finding that the:
>> >>>>
>> >>>> /etc/dirsrv/slapd-EXAMPLE-COM/dse.ldif
>> >>>>
>> >>>> file is empty and prevents dirsrv from starting. I can restore it from
>> >>>> dse.ldif.bak or dse.ldif.startOK, but this may be related to the LDAP
>> >>>> problems that I'm having?
>> >>>
>> >>> This is an upgrade time problem, it should be fixed in latest packages.
>> >>> Did you recently upgrade freeipa packages if so from what version to
>> >>> what version ?
>> >>
>> >> The 0 length file doesn't appear related to upgrades. Possibly it only
>> >> happens on the first service restart after an upgrade?
>> >>
>> >> It's happened at least 4 times since the last freeipa package upgrade
>> >> on 4th November, so it seems to be happening too regularly to be the
>> >> result of an upgrade.
>> >>
>> >> [root at curie ~]# grep freeipa /var/log/yum.log
>> >> Sep 06 16:56:51 Installed: freeipa-python-2.0.1-2.fc15.x86_64
>> >> Sep 06 17:00:13 Installed: freeipa-client-2.0.1-2.fc15.x86_64
>> >> Sep 06 17:00:14 Installed: freeipa-admintools-2.0.1-2.fc15.x86_64
>> >> Sep 06 17:01:52 Installed: freeipa-server-selinux-2.0.1-2.fc15.x86_64
>> >> Sep 06 17:01:56 Installed: freeipa-server-2.0.1-2.fc15.x86_64
>> >> Sep 08 11:23:35 Updated: freeipa-python-2.1.0-1.fc15.x86_64
>> >> Sep 08 11:23:41 Updated: freeipa-client-2.1.0-1.fc15.x86_64
>> >> Sep 08 11:23:41 Updated: freeipa-admintools-2.1.0-1.fc15.x86_64
>> >> Sep 08 11:25:00 Updated: freeipa-server-selinux-2.1.0-1.fc15.x86_64
>> >> Sep 08 11:26:06 Updated: freeipa-server-2.1.0-1.fc15.x86_64
>> >> Nov 04 15:46:43 Updated: freeipa-python-2.1.3-2.fc15.x86_64
>> >> Nov 04 15:52:48 Updated: freeipa-client-2.1.3-2.fc15.x86_64
>> >> Nov 04 15:52:48 Updated: freeipa-admintools-2.1.3-2.fc15.x86_64
>> >> Nov 04 15:54:47 Updated: freeipa-server-2.1.3-2.fc15.x86_64
>> >> Nov 04 15:56:02 Updated: freeipa-server-selinux-2.1.3-2.fc15.x86_64
>> >>
>> >> Dan
>> >
>> > I'm still having fairly serious problems. I keep getting:
>> >
>> > ipa: ERROR: Kerberos error: Kerberos error: ('Unspecified GSS failure.
>> >  Minor code may provide more information', 851968)/('Cannot contact
>> > any KDC for requested realm', -1765328228)/
>> >
>> > Whenever I try and run IPA commands on either of my servers, or a
>> > client with the admin tools installed.
>> >
>> > The server logs contain:
>> >
>> > slapd_ldap_sasl_interactive_bind - Error: could not perform
>> > interactive bind for id [] mech [GSSAPI]: error -1 (Can't contact LDAP
>> > server) ((null))
>> > slapi_ldap_bind - Error: could not perform interactive bind for id []
>> > mech [GSSAPI]: error -1 (Can't contact LDAP server)
>> >
>> > And I can't create new replicas because they fail with:
>> >
>> > 2011-12-21 11:25:58,356 DEBUG Failed to start replication
>> >  File "/usr/sbin/ipa-replica-install", line 484, in <module>
>> >    main()
>> >
>> >  File "/usr/sbin/ipa-replica-install", line 435, in main
>> >    ds = install_replica_ds(config)
>> >
>> >  File "/usr/sbin/ipa-replica-install", line 137, in install_replica_ds
>> >    pkcs12_info)
>> >
>> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py",
>> > line 284, in create_replica
>> >    self.start_creation("Configuring directory server", 60)
>> >
>> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/service.py",
>> > line 248, in start_creation
>> >    method()
>> >
>> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/dsinstance.py",
>> > line 297, in __setup_replica
>> >    r_bindpw=self.dm_password)
>> >
>> >  File "/usr/lib/python2.7/site-packages/ipaserver/install/replication.py",
>> > line 694, in setup_replication
>> >    raise RuntimeError("Failed to start replication")
>> >
>> > Can someone help me? This is getting fairly serious because I can't
>> > create/modify anything and I'm worried that there will be problems
>> > with existing users soon as well.
>>
>> OK, I think I'm narrowing in on this. It looks like the replication
>> agreement is broken and the servers have got out of sync:
>
> odd
>
>> On the 'master' server (which contains the PKI dirsrv process):
>
> The PKI instance uses a diffeent set of replication agreementsso you
> can't see those agreements with ipa-replica-manage which handles only
> the IPA Idm instance.
>
>> [root at fileserver1 ~]# ipa-replica-manage list
>> fileserver1.example.com: master
>>
>> On the other server:
>>
>> [root at fileserver2 ~]# ipa-replica-manage list
>> fileserver1.example.com: master
>> fileserver2.example.com: master
>
> strange indeed.
>
>> When I try and add the missing replication:
>>
>> [root at fileserver1 ~]# ipa-replica-manage connect fileserver2.example.com
>> unexpected error: list index out of range
>>
>> Do I need to delete the replication from fileserver2?
>
> You can't remove a replication agreement if it is the only agreement you
> have. This is to avoid split-brain situations.
>
> Not sure how to handle a disappeared agreement though it's
> theorethically not possible unless you 'inadvertently' ran
> ipa-replica-manage --force del fileserver2 on fileserver1 ...

This is possible... oops. I tried a few times to add another replica
(fileserver3) which failed as I mentioned above. The replication
process got most of the way through and showed up on one of the
servers, but not the other, so I removed the replica. It's possible
that I force removed fileserver2 by mistake.

> Can you look into cn=config and see if you have references toi
> fileserver2 ?
> Maybe it is just a bug in displaying actually active replicas.

I'm using 'jxplore' LDAP browser (my command line LDAP skills aren't
very good, I can't seem to get the kerberos authentication working
properly. In any case, I'm having trouble authenticating because of
the problems mentioned above) and did an unauthenticated search for
cn=config on fileserver1, no results.

In cn=ipa,cn=etc there are: cn=masters which contains an entry for
fileserver1 and cn=replicas which is empty.

Thanks,

Dan




More information about the Freeipa-users mailing list