[Freeipa-users] ipa-replica-install IPA startup timing issue
Daryl Fonseca-Holt
Daryl.Fonseca-Holt at umanitoba.ca
Wed Mar 23 21:23:52 UTC 2016
Forgot to CC the ML. Sorry.
--
Daryl Fonseca-Holt
IST/CNS/Unix Server Team
University of Manitoba
204.480.1079
On Wed, 23 Mar 2016, Daryl Fonseca-Holt wrote:
> Hi Thierry,
>
> I have not filed a support request with RedHat for two reasons. First, it
> seems that the NIS priming may not be a problem in the post 4.2.0 release.
> Second, I am able to work around the problem by modifying the code where it
> uses the number of krb5kdc daemons to start thus alleviating the crush of 64
> daemons starting. As suggested by Alexander I'm patching
> ipaserver/install/krbinstance.py.
>
> # diff -c ipaserver/install/krbinstance.py.df
> ipaserver/install/krbinstance.py
> *** ipaserver/install/krbinstance.py.df 2015-06-18 07:54:49.000000000
> -0500
> --- ipaserver/install/krbinstance.py 2016-03-22 13:37:16.056210609 -0500
> ***************
> *** 355,360 ****
> --- 355,362 ----
>
> MIN_KRB5KDC_WITH_WORKERS = "1.9"
> cpus = os.sysconf('SC_NPROCESSORS_ONLN')
> + #XXX
> + cpus = 1
> workers = False
> (stdout, stderr, rc) = ipautil.run(['klist', '-V'],
> raiseonerr=False)
> if rc == 0:
>
>
> With this patch the constant in /etc/sysconfig/krb5kdc is lower. After the
> ipa-replica-install completes I will increase this manually to the original
> value.
>
> This has allowed the building phase of the project to continue.
>
> Thanks, Daryl
>
> On 03/23/16 04:40, thierry bordaz wrote:
>> Hi Daryl,
>>
>> Me again... :-)
>> As a follow up of this issue I would like to know if you already open a
>> case to RH support ?
>>
>> Also, have you identified a workaround to make ipa-replica-install
>> successful or are you still suffering from this issue ?
>>
>> best regards
>> thierry
>>
>> On 03/16/2016 04:24 PM, thierry bordaz wrote:
>>> Hello Daryl,
>>>
>>> I can reproduce locally the slow DS startup (due to slapi-nis
>>> priming). In fact the version I was using had not the slapi-nis
>>> fix to differ the priming.
>>>
>>> I failed to reproduce the intensive load on DS when krb5kdc startup.
>>> Looking at yours logs, we can see that krb5kdc startup triggers a
>>> set of requests during 3s up to 8s. The logs are looking like
>>> (note the etime can go up to 2s):
>>>
>>> [10/Mar/2016:14:20:35 -0600] conn=40 fd=87 slot=87 connection
>>> from local to /var/run/slapd-UOFMT1.socket
>>> [10/Mar/2016:14:20:36 -0600] conn=40 AUTOBIND dn="cn=Directory
>>> Manager"
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=0 BIND dn="cn=Directory
>>> Manager" method=sasl version=3 mech=EXTERNAL
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=0 RESULT err=0 tag=97
>>> nentries=0 *etime=1* dn="cn=Directory Manager"
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=1 SRCH
>>> base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
>>> filter="(objectClass=*)" attrs=ALL
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=1 RESULT err=0 tag=101
>>> nentries=1 etime=0
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=2 SRCH
>>> base="cn=ipaConfig,cn=etc,dc=uofmt1" scope=0
>>> filter="(objectClass=*)" attrs="ipaConfigString ipaKrbAuthzData
>>> ipaUserAuthType"
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=2 RESULT err=0 tag=101
>>> nentries=1 etime=0
>>> [10/Mar/2016:14:20:36 -0600] conn=40 op=3 SRCH base="dc=uofmt1"
>>> scope=2 filter="(objectClass=ipaNTDomainAttrs)"
>>> attrs="ipaNTFlatName ipaNTFallbackPrimaryGroup
>>> ipaNTSecurityIdentifier"
>>> [10/Mar/2016:14:20:37 -0600] conn=40 op=3 RESULT err=0 tag=101
>>> nentries=0 *etime=1*
>>> [10/Mar/2016:14:20:37 -0600] conn=40 op=4 SRCH
>>> base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
>>> filter="(krbMKey=*)" attrs="krbMKey"
>>> [10/Mar/2016:14:20:37 -0600] conn=40 op=4 RESULT err=0 tag=101
>>> nentries=1 etime=0
>>> [10/Mar/2016:14:20:37 -0600] conn=40 op=5 SRCH base="dc=uofmt1"
>>> scope=2
>>> filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincipal))(krbPrincipalName=K/M at UOFMT1))"
>>> attrs="krbPrincipalName krbCanonicalName ipaKrbPrincipalAlias
>>> krbUPEnabled krbPrincipalKey krbTicketPolicyReference
>>> krbPrincipalExpiration krbPasswordExpiration
>>> krbPwdPolicyReference krbPrincipalType krbPwdHistory
>>> krbLastPwdChange krbPrincipalAliases krbLastSuccessfulAuth
>>> krbLastFailedAuth krbLoginFailedCount krbExtraData
>>> krbLastAdminUnlock krbObjectReferences krbTicketFlags
>>> krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory
>>> ipaKrbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass"
>>> [10/Mar/2016:14:20:38 -0600] conn=40 op=5 RESULT err=0 tag=101
>>> nentries=1 *etime=1*
>>> [10/Mar/2016:14:20:39 -0600] conn=40 op=6 UNBIND
>>> [10/Mar/2016:14:20:39 -0600] conn=40 op=6 fd=87 closed - U1
>>>
>>>
>>> I think the request op=3 (SRCH base="dc=uofmt1" scope=2
>>> filter="(objectClass=ipaNTDomainAttrs)") is slow also because of
>>> slapi-nis. In fact it is indexed and returns 0 entry. So only
>>> plugins can create this high etime.
>>> An improvement in slapi-nis makes its search callback noop when
>>> it comes from krb and I am running this improvement.
>>>
>>> In conclusion I think both slow DS startup and KRB5 startup are
>>> fixed in RHEL 7.
>>>
>>> thanks
>>> theirry
>>>
>>>
>>> On 03/14/2016 08:46 PM, Daryl Fonseca-Holt wrote:
>>>> Hello Thierry,
>>>>
>>>> Attached is the pstacks from only the final DS restart. I don't think
>>>> they will show the whole picture.
>>>>
>>>> According to the debug log /var/log/ipareplica-install.log (attached) the
>>>> start of the krb5kdc.service (19:13:16Z) is successful, but the krb5kdc
>>>> log (attach) shows it is unable to fetch the master K/M at 14:31:31CDT
>>>> (-5hour offset). This is when the install log shows kadmind failing.
>>>>
>>>> In my experience with the master observing top there are two intense
>>>> times for ns-slapd-<instance>. The first when it start, of course, and
>>>> the second when krb5kdc starts. I assume this is because krb5kdc must get
>>>> it's configuration and data from the same DS. krb5kdc fails but the
>>>> ipareplica-install script isn't aware of it. Finally kadmin.service tries
>>>> to access krb5kdc and finds that it is dead.
>>>>
>>>> Please note these logs are with Schema Compatability and NIS plugins
>>>> turned off per the other e-mail from Alexander.
>>>>
>>>> I've noticed on a running master I can prevent this type of failure by
>>>> manually starting dirsrv (systemctl start dirsrv@<instance>.service),
>>>> watch top until all threads of ns-slapd have settled, then systemctl
>>>> start krb5kdc.service, again watching top until ns-slapd threads have
>>>> settled down before systemctl start kadmin.service. This kind of manual
>>>> intervention is is not possible when running the ipareplica-install
>>>> script.
>>>>
>>>> I will look into introducing a delay at the completion of the dirsrv and
>>>> krb5kdc systemd units and see if I can accommodate ipareplica-install.
>>>> Just as an experiment for now. I need to advance the project into High
>>>> Availability testing but cannot do so without a functioning replica.
>>>>
>>>> Regards, Daryl
>>>>
>>>> On 03/14/16 09:20, thierry bordaz wrote:
>>>>> Hi Daryl,
>>>>>
>>>>> Thanks for all the data. I will look at the pstacks. A first look shows
>>>>> that you capture import, bind... so may be a complete
>>>>> ipa-replica-install session.
>>>>> I will try to retrieve the specific startup time to see what was going
>>>>> on at that time.
>>>>> If you have the time to monitor only startup, it will help me shrinking
>>>>> the set of pstacks.
>>>>> Startup of DS last > 1min. If you may start DS and as soon as the
>>>>> ns-slapd process is launched, do regular pstacks. Then when you are able
>>>>> to send a simple ldapsearch (ldapsearch -x -b "" -s base), you may stop
>>>>> taking pstacks.
>>>>>
>>>>> thanks
>>>>> thierry
>>>>>
>>>>> On 03/14/2016 03:06 PM, Daryl Fonseca-Holt wrote:
>>>>>> Hi Thierry,
>>>>>>
>>>>>> I moved the old logs into a subdirectory called try1. I did the
>>>>>> recommended ipa-server-install --uninstall. Tried the replica install
>>>>>> again. Failed during kadmind start like the previous time.
>>>>>>
>>>>>> The log from ipa-replica-install (with -d) is at
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>>>>> The console script (mostly the same as the log but with my entries) is
>>>>>> at
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console
>>>>>> The 5 second pstacks are at
>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/slapd-pstacks.console
>>>>>>
>>>>>> Thanks, Daryl
>>>>>>
>>>>>>
>>>>>> On 03/11/16 02:40, thierry bordaz wrote:
>>>>>>> Hello Deryl,
>>>>>>>
>>>>>>> My understanding is that ns-slapd is first slow to startup.
>>>>>>> Then when krb5kdc is starting it may load ns-slapd.
>>>>>>>
>>>>>>> We identified krb5kdc may be impacted by the number of users
>>>>>>> accounts.
>>>>>>> From the ns-slapd errors log it is not clear why it is so
>>>>>>> slow to start.
>>>>>>>
>>>>>>> Would you provide the ns-slapd access logs from that period.
>>>>>>> Also in order to know where ns-slapd is spending time, it
>>>>>>> would really help if you can get regular (each 5s) pstacks
>>>>>>> (with 389-ds-debuginfo), during DS startup and then later
>>>>>>> during krb5kdc startup.
>>>>>>>
>>>>>>> best regards
>>>>>>> thierry
>>>>>>>
>>>>>>>
>>>>>>> On 03/10/2016 11:10 PM, Daryl Fonseca-Holt wrote:
>>>>>>>> Environment:
>>>>>>>> RHEL 7.2
>>>>>>>> IPA 4.2.0-15
>>>>>>>> nss 3.19.1-19
>>>>>>>> 389-ds-base 1.3.4.0-26
>>>>>>>> sssd 1.13.0-40
>>>>>>>>
>>>>>>>>
>>>>>>>> I've encountered this problem in IPA 3.0.0 but hoped it was addressed
>>>>>>>> in 4.2.0.
>>>>>>>>
>>>>>>>> Trying to set up a replica of a master with 150,000+ user accounts,
>>>>>>>> NIS and Schema Compatability enabled on the master.
>>>>>>>>
>>>>>>>> During ipa-replica-install it attempts to start IPA. dirsrv starts,
>>>>>>>> krb5kdc starts, but then kadmind fails because krb5kdc has gone
>>>>>>>> missing.
>>>>>>>>
>>>>>>>> This happens during restart of IPA in version 3.0.0 too. There it can
>>>>>>>> be overcome by manually starting each component of IPA _but_ waiting
>>>>>>>> until ns-slapd-<instance> has settled down (as seen from top) before
>>>>>>>> starting krb5kdc. I also think that the startup of krb5kdc loads the
>>>>>>>> LDAP instance quite a bit.
>>>>>>>>
>>>>>>>> There is a problem in the startup logic where dirsrv is so busy that
>>>>>>>> even though krb5kdc successfully starts and allows the kadmin to
>>>>>>>> begin kdb5kdc is not really able to do its duties.
>>>>>>>>
>>>>>>>> I'm reporting this since there must be some way to delay the start of
>>>>>>>> krb5kdc and then kadmind until ns-slapd-<instance> is really open for
>>>>>>>> business.
>>>>>>>>
>>>>>>>> # systemctl status krb5kdc.service
>>>>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>>>> Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled;
>>>>>>>> vendor preset: disabled)
>>>>>>>> Active: inactive (dead)
>>>>>>>>
>>>>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5
>>>>>>>> KDC.
>>>>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5
>>>>>>>> KDC...
>>>>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5
>>>>>>>> KDC.
>>>>>>>>
>>>>>>>> # systemctl status krb5kdc.service
>>>>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>>>> Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; disabled;
>>>>>>>> vendor preset: disabled)
>>>>>>>> Active: inactive (dead)
>>>>>>>>
>>>>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 5
>>>>>>>> KDC.
>>>>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting Kerberos 5
>>>>>>>> KDC...
>>>>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 5
>>>>>>>> KDC.
>>>>>>>>
>>>>>>>> journalctl -xe was stale by the time I got to it so I've attached
>>>>>>>> /var/log/messages instead.
>>>>>>>>
>>>>>>>> The log from ipa-replica-install (with -d) is at
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>>>>>>> The console script (mostly the same as the log but with my entries)
>>>>>>>> is at
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console
>>>>>>>> The /var/log/dirsrv/ns-slapd-<instance> access log is at
>>>>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/access
>>>>>>>>
>>>>>>>> Regards, Daryl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> --
>>>>>> Daryl Fonseca-Holt
>>>>>> IST/CNS/Unix Server Team
>>>>>> University of Manitoba
>>>>>> 204.480.1079
>>>>>
>>>>
>>>> --
>>>> --
>>>> Daryl Fonseca-Holt
>>>> IST/CNS/Unix Server Team
>>>> University of Manitoba
>>>> 204.480.1079
>>>
>>>
>>>
>>
>
> --
> --
> Daryl Fonseca-Holt
> IST/CNS/Unix Server Team
> University of Manitoba
> 204.480.1079
>
>
More information about the Freeipa-users
mailing list