[Freeipa-users] caching of lookups / performance problem

Wed Feb 1 15:12:20 UTC 2017

Alright cool, thank you for getting back to me.  I appreciate your input and expertise.

Dan

> On Feb 1, 2017, at 9:08 AM, Jakub Hrozek <jhrozek at redhat.com> wrote:
> 
> On Wed, Feb 01, 2017 at 02:35:00PM +0000, Sullivan, Daniel [CRI] wrote:
>> Jakub,
>> 
>> Thank you for getting back to me.  Yeah, I agree with what you are saying.  The problem that I’m really trying to solve is the how to get them requested reasonably often part.  A good use case for my problem is basically;
>> 
>> 1) Somebody starts an interactive job on a compute node (this is somewhat unusual in it of itself).  There’s a decent chance that nobody has done this for weeks or months months in the first place.  Since a large number of our 1000 or so users aren’t compute users theres a high probablity that we have a substantial number of expired cached entries, possibly 500 or more for users in /home.
>> 2) They are navigating around on the filesystem and cd into /home and type ‘ls -l’
>> 
>> This command will actually take upwards of an hour to execute (although it will complete eventually).  If an ‘ls -l’ on a Linux system takes more than a few seconds people will think there’s a problem with the system.
>> 
>> Based on my experience even ‘nowait percentage’ has a difficult time with a large number of records past the nowait threshold.  For example, if there are 500 records past the expiration percentage threshold, the data provider will get ‘busy’ which seems to effectively appears to block the nss responder, instead of returning all 500 of those records from the cache and then queueing 500 data provider requests in the background to refresh the cache.
> 
> Yes, when the cache is totally expired, the request would block.
> 
>> 
>> Right now the only ways I can seem to get around this is to do a regular ‘ls -l’ to refresh the cache on our nodes, or just defer the problem by setting a really high entry cache timeout.  The cron approach is a little bit challenging because we need to randomize invocation times because bulk cache refreshes across the environment are going to cause high load on our domain controllers (I know this because a single cache refresh causes ns-slapd to hit 100% and sustain CPU utilization for the duration of the enumeration).
>> 
>> Is there anything crazy about setting the entry cache timeout on the client to something arbitrarily high, like 5 years (other than knowing the cache is not accurate)?  Based on my knowledge a user’s groups are evaluated at login so this should be a non-issue from a security standpoint.
> 
> I think a long expiration together with the nowait percentage might be
> a way to go.