[Freeipa-users] Needs help understand this timeout issue

Wed Feb 8 13:12:12 UTC 2017

Are you actually logging in or or just doing a lookup on a user?  I remember reading somewhere that groups are always re-evaluated at the point of login, regardless of what is in the cache.  I am not sure if this is accurate or the implications of whether or not it is on the client, server or both, but this could be what is happening.

Also, how are verifying that the group in the cache on the server is valid?  If you do a lookup a user on the server after the mapped memory cache expires (i.e. if you restart sssd), do you see a log entry that a valid cache entry has been found and that entry is being returned (from memory it is in the nsss logs)?  It might help you to focus on the sysdb cache by setting a very low timeout value for the in memory mapped cache and or negative cache timeouts. Based on my experiencif a valid cache entry in the sysdb *NOT* found, a log entry is not returned (just a lookup is performed), which for an untrained eye makes it difficult to assess when timeouts are occurring in the sysdb cache. 

Also, five minutes is a ridiculously long time to have to wait, have you inspected the performance of your 389 server?  Is your environment large or under large load?  Based on experience and another thread from this list, I have seen the 389ds threads get tied up due to extop operations because AD lookups via SSSD eventually will be blocked sssd under high load.  

I had to significantly scale 389ds worker threads and processor cores to accommodate large numbers of concurrent AD lookups, this improved the performance of our environment and solved many issues.

Dan

> On Feb 8, 2017, at 1:27 AM, Troels Hansen <th at casalogic.dk> wrote:
> 
> No, ignore_group_members option is already set.
> Tried setting and removing it on the client but didn't make a huge different.
> 
> Is a client have an completely empty cache, newly joined, empty /var/lig/sssd/db and mc cache or something, the IPA server ALWAYS asks the AD for group info, despite having a valid cache.
> 
> This leads to a delay of up to, or even more 5 minutes before a user can login. Being denied access at first, but logging again later with success.
> 
> If the client have a cache thats only expired its able to pull the account info from the cache on the IPA, without the IPA having to contact the AD.
> 
> 
> ----- On Feb 6, 2017, at 2:24 PM, Sullivan, Daniel [CRI] dsullivan2 at bsd.uchicago.edu wrote:
> 
>> Have you looked at the ignore_group_members option?  Maybe this is the problem
>> you are seeing?
>> 
>> https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-ipa-ad-trust-deployments/
>> 
>> ==snip==
>> 
>> ignore_group_members
>> Normally the most data-intensive operation is downloading the groups including
>> their members. Usually, we are interested in what groups a user is a member of
>> (id aduser at ad_domain) as the initial step rather than what members do specific
>> groups include (getent group adgroup at ad_domain). Setting the
>> ignore_group_members option to True makes all groups appear as empty, thus
>> downloading only information about the group objects themselves and not their
>> members, providing a significant performance boost. Please note that id
>> aduser at ad_domain would still return all the correct groups!
>> ==snip==
>> 
>> 
>> Dan
>> 
>> On Feb 6, 2017, at 1:59 AM, Troels Hansen <th at casalogic.dk> wrote:
>> 
>> Hi
>> 
>> I'm aware of the anatomy of how the lookup is done, but I would suppose a valid
>> cache on the IPA server would result in the cache from the IPA server being
>> used?
>> 
>> I have been debugging this issue some more, and can confirm is the client have
>> its sssd cache invalidated by "sss_cache -E" and the IPA server have a valid
>> cache, when the client asks for the user, the IPA server still asks the AD for
>> the entire group info?
>> 
>> I can see, that even though the cache is refreshed the attribute
>> initgrExpireTimestamp (in the ldb cache) isn't updated.
>> I have been unable to find out exactly what this controls?
>> 
>> lastUpdate and dataExpireTimestamp is updated to the time stamp of when the
>> refresh ran.
>> 
>> 
>> ----- On Feb 1, 2017, at 2:27 PM, Sullivan, Daniel [CRI]
>> dsullivan2 at bsd.uchicago.edu wrote:
>> 
>> Have you checked to see if the user is expired in the cache, or if it is
>> impacted by entry_cache_nowait_percentage (ref sssd.conf). The default entry
>> timeout is only 90 minutes and entry_cache_nowait_percentage default is 50.
>> 
>> ldbsearch -H  /var/lib/sss/db/timestamps_xxx.xxx.uchicago.edu.ldb >
>> ~/timestamps.txt
>> ldbsearch -H  /var/lib/sss/db/cache_xxx.xxx.uchicago.edu.ldb > ~/cache.txt
>> 
>> Might be able to provide more info.
>> 
>> Again, I’d really familiarize yourself with the anatomy of an sssd lookup, if
>> you get to know the diagram and steps 1-7 here it will be a big help to your
>> question(s).
>> https://jhrozek.wordpress.com/2015/03/11/anatomy-of-sssd-user-lookup/
>> 
>> Dan
>> 
>> 
>> On Feb 1, 2017, at 4:32 AM, Troels Hansen <th at casalogic.dk> wrote:
>> 
>> From looking af at TCP dump, I can see that if a client requests a AD user from
>> IPA, IPA does a full user lookup in AD, even though the IPA server have the
>> user in local cache?
>> 
>> It looks like a single group generates a LOT of traffic to AD like:
>> client -> IPA
>> IPA -> client
>> IPA -> AD
>> AD -> IPA
>> IPA -> AD
>> IPA -> Second AD
>> Second AD -> IPA
>> IPA -> Second AD
>> IPA -> AD
>> AD -> IPA
>> IPA -> AD
>> IPA -> client
>> client -> IPA
>> IPA -> Client
>> 
>> Something similar continues for every group the user has.
>> 
>> Soo, I guess the question that I haven't been able to find documented anywhere.
>> Isn't the SSSD group cache on the IPA servers supposed to be used then a sssd
>> client requests a user?
>> 
>> 
>> 
>> ----- On Feb 1, 2017, at 9:53 AM, Troels Hansen th at casalogic.dk wrote:
>> 
>> Hmm, suspect its happening on the server...... thous I haven't been able to
>> pinpoint a log entry that confirms my suspecting.
>> 
>> I have pinpointed the timeout to happen after 58 seconds after completely
>> removing the SSSD cache and restaring SSSD, which leads me to think my issue is
>> related to ldap_enumaration_search_timeout which defaults to 60.
>> 
>> rm -fr /var/lib/sss/{mc,db}/* && systemctl restart sssd && time id shja
>> 
>> However, I'm unable to crank it up on the server as it seems to be adjusted Down
>> to 60 Again?
>> 
>> Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_get_options] (0x0400): Option
>> ldap_enumeration_search_timeout has value 120
>> (Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] (0x0400):
>> Option ldap_enumeration_search_timeout has value 60
>> (Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] (0x0400):
>> Option ldap_enumeration_search_timeout has value 60
>> 
>> LDAP seems speedy enough, not timeouts while querying it manually while a client
>> is doing a user lookup.
>> 
>> ----- On Jan 30, 2017, at 6:06 PM, Sullivan, Daniel [CRI]
>> dsullivan2 at bsd.uchicago.edu wrote:
>> 
>> 
>> If the timeout is occurring on the server, I would start by increasing one or
>> both of these values:
>> 
>> ldap_opt_timeout
>> ldap_search_timeout
>> 
>> If that doesn’t work I’d take look to see if the 389 server is under high load
>> when you are performing this operation.  The easiest way I have found to do
>> this is to just execute an LDAP query directly against the IPA server when you
>> are performing an id lookup, for example:
>> 
>> ldapsearch -D "cn=Directory Manager" -w <pw> -s base -b "cn=config"
>> "(objectclass=*)”
>> 
>> If the LDAP server is not responsive you probably need to increase the number of
>> worker threads for 389ds.  Also, you might want to disable referrals, check out
>> the man pages for this;
>> 
>> ldap_referrals = false
>> 
>> Also, FWIW, if you crank up debug logging on the sssd client, you should be able
>> to see the amount of seconds of timeout assigned to the operation, and witness
>> the fact that the operation is actually timing out by inspecting the logs
>> themselves.  The logs can get a little verbose but the data is there.
>> 
>> 
>> --
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go to http://freeipa.org for more info on the project
>> 
>> --
>> Med venlig hilsen
>> 
>> Troels Hansen
>> 
>> Systemkonsulent
>> 
>> Casalogic A/S
>> 
>> 
>> T (+45) 70 20 10 63
>> 
>> M (+45) 22 43 71 57
>> 
>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og
>> meget mere.
>> 
>> --
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go to http://freeipa.org for more info on the project
>> 
>> --
>> Med venlig hilsen
>> 
>> Troels Hansen
>> 
>> Systemkonsulent
>> 
>> Casalogic A/S
>> 
>> 
>> T (+45) 70 20 10 63
>> 
>> M (+45) 22 43 71 57
>> 
>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og
>> meget mere.
> 
> -- 
> Med venlig hilsen 
> 
> Troels Hansen 
> 
> Systemkonsulent 
> 
> Casalogic A/S 
> 
> 
> T (+45) 70 20 10 63 
> 
> M (+45) 22 43 71 57 
> 
> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og meget mere.