nss-ldap/nscd problems

Matthew B. Brookover mbrookov at mines.edu
Wed Aug 10 23:00:30 UTC 2005


I have a host running RedHat Enterprise 3 AS Release 5 academic version.

NSCD seems to hang after a cache entry has timed out.  If you bounce
ncsd, all of the hung processes will continue like there was no problem.

Nscd is configured to time out entries after 1 hour.  To recreate the
hang, bounce nscd to get it working, log in to the host, wait for 1
hour, then try an ls -l or any other command that will call getpw*. 
There are times when it does not hang, but most of the time nscd is
hung.

We are using openldap 2.2.26, Kerberos 1.4.1, and sasl 2.1.21 on
dedicated ldap and kerberos servers.

Other clients running Fedora Core 3 work fine.

The client running redhat enterprise 3 AS release 5 is using the
versions of sasl, nss-ldap, nscd, etc that came with the release:
cyrus-sasl-2.1.15-10
pam_krb5-1.75-1
krb5-devel-1.2.7-47
cyrus-sasl-gssapi-2.1.15-10
openldap-clients-2.0.27-17
openldap-devel-2.0.27-17
nss_ldap-207-15
krb5-workstation-1.2.7-47
krb5-libs-1.2.7-47
nscd-2.3.2-95.33

nss-ldap and nscd log these errors in /var/log/messages:
Aug  8 10:36:04 imagine nscd: nss_ldap: reconnecting to LDAP server...
Aug  8 10:36:04 imagine nscd: nss_ldap: reconnected to LDAP server after
1 attempt(s)

Kerberos, GSSAPI, SASL, etc all work correctly.

When nscd is hung, any program that calls getpwuid, getpwnam or getpwent
will hang. I presume other functions that would cause a lookup through
nscd and nss_ldap will also hang.

The server running RHEL 3.5 was originally installed with 3.4 and then
upgraded to 3.5.  After the upgrade, Kerberos, ldap, etc were
configured.  This may be a problem that is new to 3.5.  I did not test
ldap, kerberos, sasl, etc under 3.4.

When nscd is hung, you can log in as root and run an ldapsearch.  The
results are returned correctly.  I followed these steps to test the ldap
and kerberos servers:
1) rebooted the RHEL 3 release 5 ldap/kerberos client
2) logged in as my self
3) logged off
4) waited an hour for nscd's cache to time out
5) logged in as my self (the login hung before printing the password
prompt) I waited several minutes to make sure that it was not going to
continue
6) logged in as root on another terminal.
7) ran an ldap search for my user and ran kinit (both worked)
8) ran 'service nscd restart'
9) went back to the first termianl, entered my password and was able to
log in.
10) waited 1 hour
11) ran an ls -l, ls -l then hung.  CTRL-c will unhang ls or other
process that does not catch the signal.

There are times when nscd or nss-ldap will unhang on their own.  Any
process calling getpw* will continue.

/etc/nsswitch.conf is set with:
passwd:     files ldap
shadow:     files ldap
group:      files ldap

#hosts:     db files nisplus nis dns
hosts:      files dns

# Example - obey only what nisplus tells us...
#services:   nisplus [NOTFOUND=return] files
#networks:   nisplus [NOTFOUND=return] files
#protocols:  nisplus [NOTFOUND=return] files
#rpc:        nisplus [NOTFOUND=return] files
#ethers:     nisplus [NOTFOUND=return] files
#netmasks:   nisplus [NOTFOUND=return] files

bootparams: nisplus [NOTFOUND=return] files

ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files

netgroup:   files

publickey:  nisplus

automount:  files
aliases:    files nisplus

---------------------------------------------

The server is a Gateway 9515 with 2 3GHZ Xeon processors and 4GB RAM. 
It will be serving email and other services very soon.  Fortunately, it
is not in production yet.

Any ideas?

thank you.

Matt Brookover
mbrookov at mines.edu
303-273-3436





More information about the redhat-list mailing list