[Freeipa-users] performance scaling of sssd / freeipa

Sullivan, Daniel [CRI] dsullivan2 at bsd.uchicago.edu
Thu Jan 19 22:14:50 UTC 2017


Hi,

I’ve received incredibly good support from this mailing list previously; I am hoping that somebody can help me succeed in my ongoing efforts.  I have spent a few days on this at this point and I can’t seem to figure it out how to address this issue.  On my DCs I am seeing excessive ldap_search_ext and sdap_get_generic_ext_recv timeouts created solely by the invocation of the ‘id’ command on sssd clients.  This problem seems to present itself only when I parallelize lookups for an ‘uncached’ user (i.e. I have never performed an initial lookup).  Individual arbitrary one-off lookups for a single uncached user on a single system almost always work fine.  This leads me to believe this is a performance tuning issue.

We operate in an academic research computing unit (i.e. we have an HPC cluster), and I need the ability to lookup the same user in parallel (using the id command) across a relatively large number of systems, for example to spawn jobs that require large amounts of CPU cores and/or memory.  Right now I am doing about 50 parallel lookups for the same user to induce this problem.  

Here is some background information:

1) I have read Jakub's “Anatomy of an SSSD Lookup” as well as “Performance Tuning of SSSD for large IPA-AD deployments”, as well as implemented recommendations from the performance tuning doc, including moving the sssd cache to tmpfs.
2) We are on ipa-server 4.4.0-14.el7_3.4 using a trusted AD domain; all of our consumed users and groups are in the AD trusted domain.  We have two domain controllers; each is a RHEL 7.3 VM with 6 GB of memory.  Almost all (if not all) of our clients are running at least sssd 1.14, and are all RHEL 6/7.  Neither DC is swapping, and both have 2 CPUs.
3) I have tuned SSSD clients on the DCs and all clients to include these options (the problem persists):
  a) ldap_opt_timeout = 60
  b) ldap_search_timeout = 60
4) On both DCs, I can clear the SSSD cache, and lookup all 2000 or so users in my environment with 40 concurrent lookups occurring locally on each DC (using UNIX job control).  I can process all 2000 lookups in this manner without any failures (on either DC), and have ‘pre-populated’ the SSSD cache on both DC’s by doing this.
6) I have made no additional performance tuning changes other than what has been described.

Would anybody be able to advise on any potential tuning that would be required (presumably on the DCs), to facilitate 50 parallel lookups without experiencing sdap_get_generic_ext_recv or  ldap_search_ext  timeouts?  Should I be able to do this sort of thing with relative ease?  I was hoping this would be the sort of thing that would just work, but based on my relatively extensive testing it doesn’t.  Any advice anybody could provide would be greatly appreciated.

Thank you,

Dan Sullivan






More information about the Freeipa-users mailing list