[Freeipa-users] ipa-replica-install IPA startup timing issue

Daryl Fonseca-Holt Daryl.Fonseca-Holt at umanitoba.ca
Wed Mar 23 21:23:52 UTC 2016


Forgot to CC the ML. Sorry.

  --
  Daryl Fonseca-Holt
  IST/CNS/Unix Server Team
  University of Manitoba
  204.480.1079

On Wed, 23 Mar 2016, Daryl Fonseca-Holt wrote:

> Hi Thierry,
>
> I have not filed a support request with RedHat for two reasons. First, it 
> seems that the NIS priming may not be a problem in the post 4.2.0 release. 
> Second, I am able to work around the problem by modifying the code where it 
> uses the number of krb5kdc daemons to start thus alleviating the crush of 64 
> daemons starting. As suggested by Alexander I'm patching 
> ipaserver/install/krbinstance.py.
>
> # diff -c ipaserver/install/krbinstance.py.df 
> ipaserver/install/krbinstance.py
> *** ipaserver/install/krbinstance.py.df    2015-06-18 07:54:49.000000000 
> -0500
> --- ipaserver/install/krbinstance.py    2016-03-22 13:37:16.056210609 -0500
> ***************
> *** 355,360 ****
> --- 355,362 ----
>
>          MIN_KRB5KDC_WITH_WORKERS = "1.9"
>          cpus = os.sysconf('SC_NPROCESSORS_ONLN')
> +     #XXX
> +     cpus = 1
>          workers = False
>          (stdout, stderr, rc) = ipautil.run(['klist', '-V'], 
> raiseonerr=False)
>          if rc == 0:
>
>
> With this patch the constant in /etc/sysconfig/krb5kdc is lower. After the 
> ipa-replica-install completes I will increase this manually to the original 
> value.
>
> This has allowed the building phase of the project to continue.
>
> Thanks, Daryl
>
> On 03/23/16 04:40, thierry bordaz wrote:
>> Hi Daryl,
>> 
>> Me again... :-)
>> As a follow up of this issue I would like to know if you already open a 
>> case to RH support ?
>> 
>> Also, have you identified a workaround to make ipa-replica-install 
>> successful or are you still suffering from this issue ?
>> 
>> best regards
>> thierry
>> 
>> On 03/16/2016 04:24 PM, thierry bordaz wrote:
>>> Hello Daryl,
>>>
>>>     I can reproduce locally the slow DS startup (due to slapi-nis
>>>     priming). In fact the version I was using had not the slapi-nis
>>>     fix to differ the priming.
>>>
>>>     I failed to reproduce the intensive load on DS when krb5kdc startup.
>>>     Looking at yours logs, we can see that krb5kdc startup triggers a
>>>     set of requests during 3s up to 8s. The logs are looking like
>>>     (note the etime can go up to 2s):
>>>
>>>     [10/Mar/2016:14:20:35 -0600] conn=40 fd=87 slot=87 connection
>>>     from local to /var/run/slapd-UOFMT1.socket
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 AUTOBIND dn="cn=Directory
>>>     Manager"
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=0 BIND dn="cn=Directory
>>>     Manager" method=sasl version=3 mech=EXTERNAL
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=0 RESULT err=0 tag=97
>>>     nentries=0 *etime=1* dn="cn=Directory Manager"
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=1 SRCH
>>>     base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
>>>     filter="(objectClass=*)" attrs=ALL
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=1 RESULT err=0 tag=101
>>>     nentries=1 etime=0
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=2 SRCH
>>>     base="cn=ipaConfig,cn=etc,dc=uofmt1" scope=0
>>>     filter="(objectClass=*)" attrs="ipaConfigString ipaKrbAuthzData
>>>     ipaUserAuthType"
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=2 RESULT err=0 tag=101
>>>     nentries=1 etime=0
>>>     [10/Mar/2016:14:20:36 -0600] conn=40 op=3 SRCH base="dc=uofmt1"
>>>     scope=2 filter="(objectClass=ipaNTDomainAttrs)"
>>>     attrs="ipaNTFlatName ipaNTFallbackPrimaryGroup
>>>     ipaNTSecurityIdentifier"
>>>     [10/Mar/2016:14:20:37 -0600] conn=40 op=3 RESULT err=0 tag=101
>>>     nentries=0 *etime=1*
>>>     [10/Mar/2016:14:20:37 -0600] conn=40 op=4 SRCH
>>>     base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
>>>     filter="(krbMKey=*)" attrs="krbMKey"
>>>     [10/Mar/2016:14:20:37 -0600] conn=40 op=4 RESULT err=0 tag=101
>>>     nentries=1 etime=0
>>>     [10/Mar/2016:14:20:37 -0600] conn=40 op=5 SRCH base="dc=uofmt1"
>>>     scope=2
>>>     filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincipal))(krbPrincipalName=K/M at UOFMT1))"
>>>     attrs="krbPrincipalName krbCanonicalName ipaKrbPrincipalAlias
>>>     krbUPEnabled krbPrincipalKey krbTicketPolicyReference
>>>     krbPrincipalExpiration krbPasswordExpiration
>>>     krbPwdPolicyReference krbPrincipalType krbPwdHistory
>>>     krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth
>>>     krbLastFailedAuth krbLoginFailedCount krbExtraData
>>>     krbLastAdminUnlock krbObjectReferences krbTicketFlags
>>>     krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory
>>>     ipaKrbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass"
>>>     [10/Mar/2016:14:20:38 -0600] conn=40 op=5 RESULT err=0 tag=101
>>>     nentries=1 *etime=1*
>>>     [10/Mar/2016:14:20:39 -0600] conn=40 op=6 UNBIND
>>>     [10/Mar/2016:14:20:39 -0600] conn=40 op=6 fd=87 closed - U1
>>> 
>>>
>>>     I think the request op=3 (SRCH base="dc=uofmt1" scope=2
>>>     filter="(objectClass=ipaNTDomainAttrs)") is slow also because of
>>>     slapi-nis. In fact it is indexed and returns 0 entry. So only
>>>     plugins can create this high etime.
>>>     An improvement in slapi-nis makes its search callback noop when
>>>     it comes from krb and I am running this improvement.
>>>
>>>     In conclusion I think both slow DS startup and KRB5 startup are
>>>     fixed in RHEL 7.
>>>
>>>     thanks
>>>     theirry
>>> 
>>> 
>>> On 03/14/2016 08:46 PM, Daryl Fonseca-Holt wrote:
>>>> Hello Thierry,
>>>> 
>>>> Attached is the pstacks from only the final DS restart. I don't think 
>>>> they will show the whole picture.
>>>> 
>>>> According to the debug log /var/log/ipareplica-install.log (attached) the 
>>>> start of the krb5kdc.service (19:13:16Z) is successful, but the krb5kdc 
>>>> log (attach) shows it is unable to fetch the master K/M at 14:31:31CDT 
>>>> (-5hour offset). This is when the install log shows kadmind failing.
>>>> 
>>>> In my experience with the master observing top there are two intense 
>>>> times for ns-slapd-<instance>. The first when it start, of course, and 
>>>> the second when krb5kdc starts. I assume this is because krb5kdc must get 
>>>> it's configuration and data from the same DS. krb5kdc fails but the 
>>>> ipareplica-install script isn't aware of it. Finally kadmin.service tries 
>>>> to access krb5kdc and finds that it is dead.
>>>> 
>>>> Please note these logs are with Schema Compatability and NIS plugins 
>>>> turned off per the other e-mail from Alexander.
>>>> 
>>>> I've noticed on a running master I can prevent this type of failure by 
>>>> manually starting dirsrv (systemctl start dirsrv@<instance>.service), 
>>>> watch top until all threads of ns-slapd have settled, then systemctl 
>>>> start krb5kdc.service, again watching top until ns-slapd threads have 
>>>> settled down before systemctl start kadmin.service.  This kind of manual 
>>>> intervention is is not possible when running the ipareplica-install 
>>>> script.
>>>> 
>>>> I will look into introducing a delay at the completion of the dirsrv and 
>>>> krb5kdc systemd units and see if I can accommodate ipareplica-install. 
>>>> Just as an experiment for now. I need to advance the project into High 
>>>> Availability testing but cannot do so without a functioning replica.
>>>> 
>>>> Regards, Daryl
>>>> 
>>>> On 03/14/16 09:20, thierry bordaz wrote:
>>>>> Hi Daryl,
>>>>> 
>>>>> Thanks for all the data. I will look at the pstacks. A first look shows 
>>>>> that you capture import, bind... so may be a complete 
>>>>> ipa-replica-install session.
>>>>> I will try to retrieve the specific startup time to see what was going 
>>>>> on at that time.
>>>>> If you have the time to monitor only startup, it will help me shrinking 
>>>>> the set of pstacks.
>>>>> Startup of DS last > 1min. If you may start DS and as soon as the 
>>>>> ns-slapd process is launched, do regular pstacks. Then when you are able 
>>>>> to send a simple ldapsearch (ldapsearch -x -b "" -s base), you may stop 
>>>>> taking pstacks.
>>>>> 
>>>>> thanks
>>>>> thierry
>>>>> 
>>>>> On 03/14/2016 03:06 PM, Daryl Fonseca-Holt wrote:
>>>>>> Hi Thierry,
>>>>>> 
>>>>>> I moved the old logs into a subdirectory called try1. I did the 
>>>>>> recommended ipa-server-install --uninstall. Tried the replica install 
>>>>>> again. Failed during kadmind start like the previous time.
>>>>>> 
>>>>>> The log from ipa-replica-install (with -d) is at 
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>>>>> The console script (mostly the same as the log but with my entries) is 
>>>>>> at 
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console
>>>>>> The 5 second pstacks are at 
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/slapd-pstacks.console
>>>>>> 
>>>>>> Thanks, Daryl
>>>>>> 
>>>>>> 
>>>>>> On 03/11/16 02:40, thierry bordaz wrote:
>>>>>>> Hello Deryl,
>>>>>>>
>>>>>>>     My understanding is that ns-slapd is first slow to startup.
>>>>>>>     Then when krb5kdc is starting it may load ns-slapd.
>>>>>>>
>>>>>>>     We identified krb5kdc may be impacted by the number of users
>>>>>>>     accounts.
>>>>>>>     From the ns-slapd errors log it is not clear why it is so
>>>>>>>     slow to start.
>>>>>>>
>>>>>>>     Would you provide the ns-slapd  access logs from that period.
>>>>>>>     Also in order to know where ns-slapd is spending time, it
>>>>>>>     would really help if you can get regular (each 5s) pstacks
>>>>>>>     (with 389-ds-debuginfo), during DS startup and then later
>>>>>>>     during krb5kdc startup.
>>>>>>>
>>>>>>>     best regards
>>>>>>>     thierry
>>>>>>> 
>>>>>>> 
>>>>>>> On 03/10/2016 11:10 PM, Daryl Fonseca-Holt wrote:
>>>>>>>> Environment:
>>>>>>>>   RHEL 7.2
>>>>>>>>   IPA 4.2.0-15
>>>>>>>>   nss 3.19.1-19
>>>>>>>>   389-ds-base 1.3.4.0-26
>>>>>>>>   sssd 1.13.0-40
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I've encountered this problem in IPA 3.0.0 but hoped it was addressed 
>>>>>>>> in 4.2.0.
>>>>>>>> 
>>>>>>>> Trying to set up a replica of a master with 150,000+ user accounts, 
>>>>>>>> NIS and Schema Compatability enabled on the master.
>>>>>>>> 
>>>>>>>> During ipa-replica-install it attempts to start IPA. dirsrv starts, 
>>>>>>>> krb5kdc starts, but then kadmind fails because krb5kdc has gone 
>>>>>>>> missing.
>>>>>>>> 
>>>>>>>> This happens during restart of IPA in version 3.0.0 too. There it can 
>>>>>>>> be overcome by manually starting each component of IPA _but_ waiting 
>>>>>>>> until ns-slapd-<instance> has settled down (as seen from top) before 
>>>>>>>> starting krb5kdc. I also think that the startup of krb5kdc loads the 
>>>>>>>> LDAP instance quite a bit.
>>>>>>>> 
>>>>>>>> There is a problem in the startup logic where dirsrv is so busy that 
>>>>>>>> even though krb5kdc successfully starts and allows the kadmin to 
>>>>>>>> begin kdb5kdc is not really able to do its duties.
>>>>>>>> 
>>>>>>>> I'm reporting this since there must be some way to delay the start of 
>>>>>>>> krb5kdc and then kadmind until ns-slapd-<instance> is really open for 
>>>>>>>> business.
>>>>>>>> 
>>>>>>>> # systemctl status krb5kdc.service
>>>>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>>>>    Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled; 
>>>>>>>> vendor preset: disabled)
>>>>>>>>    Active: inactive (dead)
>>>>>>>> 
>>>>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5 
>>>>>>>> KDC.
>>>>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5 
>>>>>>>> KDC...
>>>>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5 
>>>>>>>> KDC.
>>>>>>>> 
>>>>>>>> # systemctl status krb5kdc.service
>>>>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>>>>    Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled; 
>>>>>>>> vendor preset: disabled)
>>>>>>>>    Active: inactive (dead)
>>>>>>>> 
>>>>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5 
>>>>>>>> KDC.
>>>>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5 
>>>>>>>> KDC...
>>>>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5 
>>>>>>>> KDC.
>>>>>>>> 
>>>>>>>> journalctl -xe was stale by the time I got to it so I've attached 
>>>>>>>> /var/log/messages instead.
>>>>>>>> 
>>>>>>>> The log from ipa-replica-install (with -d) is at 
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>>>>>>> The console script (mostly the same as the log but with my entries) 
>>>>>>>> is at 
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console 
>>>>>>>> The /var/log/dirsrv/ns-slapd-<instance> access log is at 
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/access
>>>>>>>> 
>>>>>>>> Regards, Daryl
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>>   --
>>>>>>   Daryl Fonseca-Holt
>>>>>>   IST/CNS/Unix Server Team
>>>>>>   University of Manitoba
>>>>>>   204.480.1079
>>>>> 
>>>> 
>>>> --
>>>>   --
>>>>   Daryl Fonseca-Holt
>>>>   IST/CNS/Unix Server Team
>>>>   University of Manitoba
>>>>   204.480.1079
>>> 
>>> 
>>> 
>> 
>
> -- 
> --
> Daryl Fonseca-Holt
> IST/CNS/Unix Server Team
> University of Manitoba
> 204.480.1079
>
>


More information about the Freeipa-users mailing list