nss-ldap/nscd problems

Matthew B. Brookover mbrookov at mines.edu
Fri Aug 12 02:52:16 UTC 2005


I tend to agree, nscd is a huge pain. Unfortunately, nscd does speed
things up when it works.

I think we found the problem, the ldap servers were moved behind  a
firewall that times out connections after one hour.  We reset the timout
for ldap to 10 hours. So far, the server has let me log in after 6
hours.  The network people assure me that as long as there is traffic,
the connection will stay open.  I think I would like to dump the fire
wall along with nscd.

I have not looked at the source code for nss_ldap, but I would guess
that keepalive is not set on the socket.  Nscd and/or nss_ldap also have
problems reconnecting after a network failure.  Fedora core 3 did not
have any trouble, suggesting that these problems were fixed in later
versions of nss_ldap and glibc.

The server is on its way to processing several thousand email messages a
day for quite a few users.  I doubt it will be possible without nscd.

I would prefer to upgrade to Enterprise 4, but we have an ISCSI based
san from Left Hand Networks.  Last work was iscsi is not supported in
Enterprise 4.  I hope they will get that fixed soon.

Matt


On Thu, 2005-08-11 at 16:26 -0500, Chris St. Pierre wrote:
> This may be off-base, but is there any reason you can't just kill
> nscd, remove it from your init scripts, and never speak the name of
> the grotesque beast again?  That's probably one of my least favorite
> programs *ever* (right up there with Microsoft Bob and netinfo) and is
> one of the first things I get rid of.  Do you need it?
> 
> Chris St. Pierre
> Unix Systems Administrator
> Nebraska Wesleyan University
> 
> On Wed, 10 Aug 2005, Matthew B. Brookover wrote:
> 
> >I have a host running RedHat Enterprise 3 AS Release 5 academic version.
> >
> >NSCD seems to hang after a cache entry has timed out.  If you bounce
> >ncsd, all of the hung processes will continue like there was no problem.
> >
> >Nscd is configured to time out entries after 1 hour.  To recreate the
> >hang, bounce nscd to get it working, log in to the host, wait for 1
> >hour, then try an ls -l or any other command that will call getpw*. 
> >There are times when it does not hang, but most of the time nscd is
> >hung.
> >
> >We are using openldap 2.2.26, Kerberos 1.4.1, and sasl 2.1.21 on
> >dedicated ldap and kerberos servers.
> >
> >Other clients running Fedora Core 3 work fine.
> >
> >The client running redhat enterprise 3 AS release 5 is using the
> >versions of sasl, nss-ldap, nscd, etc that came with the release:
> >cyrus-sasl-2.1.15-10
> >pam_krb5-1.75-1
> >krb5-devel-1.2.7-47
> >cyrus-sasl-gssapi-2.1.15-10
> >openldap-clients-2.0.27-17
> >openldap-devel-2.0.27-17
> >nss_ldap-207-15
> >krb5-workstation-1.2.7-47
> >krb5-libs-1.2.7-47
> >nscd-2.3.2-95.33
> >
> >nss-ldap and nscd log these errors in /var/log/messages:
> >Aug  8 10:36:04 imagine nscd: nss_ldap: reconnecting to LDAP server...
> >Aug  8 10:36:04 imagine nscd: nss_ldap: reconnected to LDAP server after
> >1 attempt(s)
> >
> >Kerberos, GSSAPI, SASL, etc all work correctly.
> >
> >When nscd is hung, any program that calls getpwuid, getpwnam or getpwent
> >will hang. I presume other functions that would cause a lookup through
> >nscd and nss_ldap will also hang.
> >
> >The server running RHEL 3.5 was originally installed with 3.4 and then
> >upgraded to 3.5.  After the upgrade, Kerberos, ldap, etc were
> >configured.  This may be a problem that is new to 3.5.  I did not test
> >ldap, kerberos, sasl, etc under 3.4.
> >
> >When nscd is hung, you can log in as root and run an ldapsearch.  The
> >results are returned correctly.  I followed these steps to test the ldap
> >and kerberos servers:
> >1) rebooted the RHEL 3 release 5 ldap/kerberos client
> >2) logged in as my self
> >3) logged off
> >4) waited an hour for nscd's cache to time out
> >5) logged in as my self (the login hung before printing the password
> >prompt) I waited several minutes to make sure that it was not going to
> >continue
> >6) logged in as root on another terminal.
> >7) ran an ldap search for my user and ran kinit (both worked)
> >8) ran 'service nscd restart'
> >9) went back to the first termianl, entered my password and was able to
> >log in.
> >10) waited 1 hour
> >11) ran an ls -l, ls -l then hung.  CTRL-c will unhang ls or other
> >process that does not catch the signal.
> >
> >There are times when nscd or nss-ldap will unhang on their own.  Any
> >process calling getpw* will continue.
> >
> >/etc/nsswitch.conf is set with:
> >passwd:     files ldap
> >shadow:     files ldap
> >group:      files ldap
> >
> >#hosts:     db files nisplus nis dns
> >hosts:      files dns
> >
> ># Example - obey only what nisplus tells us...
> >#services:   nisplus [NOTFOUND=return] files
> >#networks:   nisplus [NOTFOUND=return] files
> >#protocols:  nisplus [NOTFOUND=return] files
> >#rpc:        nisplus [NOTFOUND=return] files
> >#ethers:     nisplus [NOTFOUND=return] files
> >#netmasks:   nisplus [NOTFOUND=return] files
> >
> >bootparams: nisplus [NOTFOUND=return] files
> >
> >ethers:     files
> >netmasks:   files
> >networks:   files
> >protocols:  files
> >rpc:        files
> >services:   files
> >
> >netgroup:   files
> >
> >publickey:  nisplus
> >
> >automount:  files
> >aliases:    files nisplus
> >
> >---------------------------------------------
> >
> >The server is a Gateway 9515 with 2 3GHZ Xeon processors and 4GB RAM. 
> >It will be serving email and other services very soon.  Fortunately, it
> >is not in production yet.
> >
> >Any ideas?
> >
> >thank you.
> >
> >Matt Brookover
> >mbrookov at mines.edu
> >303-273-3436
> >
> >
> >-- 
> >redhat-list mailing list
> >unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> >https://www.redhat.com/mailman/listinfo/redhat-list
> >
> 




More information about the redhat-list mailing list