nss-ldap/nscd problems
Matthew B. Brookover
mbrookov at mines.edu
Fri Aug 12 02:52:16 UTC 2005
I tend to agree, nscd is a huge pain. Unfortunately, nscd does speed
things up when it works.
I think we found the problem, the ldap servers were moved behind a
firewall that times out connections after one hour. We reset the timout
for ldap to 10 hours. So far, the server has let me log in after 6
hours. The network people assure me that as long as there is traffic,
the connection will stay open. I think I would like to dump the fire
wall along with nscd.
I have not looked at the source code for nss_ldap, but I would guess
that keepalive is not set on the socket. Nscd and/or nss_ldap also have
problems reconnecting after a network failure. Fedora core 3 did not
have any trouble, suggesting that these problems were fixed in later
versions of nss_ldap and glibc.
The server is on its way to processing several thousand email messages a
day for quite a few users. I doubt it will be possible without nscd.
I would prefer to upgrade to Enterprise 4, but we have an ISCSI based
san from Left Hand Networks. Last work was iscsi is not supported in
Enterprise 4. I hope they will get that fixed soon.
Matt
On Thu, 2005-08-11 at 16:26 -0500, Chris St. Pierre wrote:
> This may be off-base, but is there any reason you can't just kill
> nscd, remove it from your init scripts, and never speak the name of
> the grotesque beast again? That's probably one of my least favorite
> programs *ever* (right up there with Microsoft Bob and netinfo) and is
> one of the first things I get rid of. Do you need it?
>
> Chris St. Pierre
> Unix Systems Administrator
> Nebraska Wesleyan University
>
> On Wed, 10 Aug 2005, Matthew B. Brookover wrote:
>
> >I have a host running RedHat Enterprise 3 AS Release 5 academic version.
> >
> >NSCD seems to hang after a cache entry has timed out. If you bounce
> >ncsd, all of the hung processes will continue like there was no problem.
> >
> >Nscd is configured to time out entries after 1 hour. To recreate the
> >hang, bounce nscd to get it working, log in to the host, wait for 1
> >hour, then try an ls -l or any other command that will call getpw*.
> >There are times when it does not hang, but most of the time nscd is
> >hung.
> >
> >We are using openldap 2.2.26, Kerberos 1.4.1, and sasl 2.1.21 on
> >dedicated ldap and kerberos servers.
> >
> >Other clients running Fedora Core 3 work fine.
> >
> >The client running redhat enterprise 3 AS release 5 is using the
> >versions of sasl, nss-ldap, nscd, etc that came with the release:
> >cyrus-sasl-2.1.15-10
> >pam_krb5-1.75-1
> >krb5-devel-1.2.7-47
> >cyrus-sasl-gssapi-2.1.15-10
> >openldap-clients-2.0.27-17
> >openldap-devel-2.0.27-17
> >nss_ldap-207-15
> >krb5-workstation-1.2.7-47
> >krb5-libs-1.2.7-47
> >nscd-2.3.2-95.33
> >
> >nss-ldap and nscd log these errors in /var/log/messages:
> >Aug 8 10:36:04 imagine nscd: nss_ldap: reconnecting to LDAP server...
> >Aug 8 10:36:04 imagine nscd: nss_ldap: reconnected to LDAP server after
> >1 attempt(s)
> >
> >Kerberos, GSSAPI, SASL, etc all work correctly.
> >
> >When nscd is hung, any program that calls getpwuid, getpwnam or getpwent
> >will hang. I presume other functions that would cause a lookup through
> >nscd and nss_ldap will also hang.
> >
> >The server running RHEL 3.5 was originally installed with 3.4 and then
> >upgraded to 3.5. After the upgrade, Kerberos, ldap, etc were
> >configured. This may be a problem that is new to 3.5. I did not test
> >ldap, kerberos, sasl, etc under 3.4.
> >
> >When nscd is hung, you can log in as root and run an ldapsearch. The
> >results are returned correctly. I followed these steps to test the ldap
> >and kerberos servers:
> >1) rebooted the RHEL 3 release 5 ldap/kerberos client
> >2) logged in as my self
> >3) logged off
> >4) waited an hour for nscd's cache to time out
> >5) logged in as my self (the login hung before printing the password
> >prompt) I waited several minutes to make sure that it was not going to
> >continue
> >6) logged in as root on another terminal.
> >7) ran an ldap search for my user and ran kinit (both worked)
> >8) ran 'service nscd restart'
> >9) went back to the first termianl, entered my password and was able to
> >log in.
> >10) waited 1 hour
> >11) ran an ls -l, ls -l then hung. CTRL-c will unhang ls or other
> >process that does not catch the signal.
> >
> >There are times when nscd or nss-ldap will unhang on their own. Any
> >process calling getpw* will continue.
> >
> >/etc/nsswitch.conf is set with:
> >passwd: files ldap
> >shadow: files ldap
> >group: files ldap
> >
> >#hosts: db files nisplus nis dns
> >hosts: files dns
> >
> ># Example - obey only what nisplus tells us...
> >#services: nisplus [NOTFOUND=return] files
> >#networks: nisplus [NOTFOUND=return] files
> >#protocols: nisplus [NOTFOUND=return] files
> >#rpc: nisplus [NOTFOUND=return] files
> >#ethers: nisplus [NOTFOUND=return] files
> >#netmasks: nisplus [NOTFOUND=return] files
> >
> >bootparams: nisplus [NOTFOUND=return] files
> >
> >ethers: files
> >netmasks: files
> >networks: files
> >protocols: files
> >rpc: files
> >services: files
> >
> >netgroup: files
> >
> >publickey: nisplus
> >
> >automount: files
> >aliases: files nisplus
> >
> >---------------------------------------------
> >
> >The server is a Gateway 9515 with 2 3GHZ Xeon processors and 4GB RAM.
> >It will be serving email and other services very soon. Fortunately, it
> >is not in production yet.
> >
> >Any ideas?
> >
> >thank you.
> >
> >Matt Brookover
> >mbrookov at mines.edu
> >303-273-3436
> >
> >
> >--
> >redhat-list mailing list
> >unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> >https://www.redhat.com/mailman/listinfo/redhat-list
> >
>
More information about the redhat-list
mailing list