[Freeipa-users] ipa-replica-install IPA startup timing issue

thierry bordaz tbordaz at redhat.com
Wed Mar 16 15:24:48 UTC 2016


Hello Daryl,

    I can reproduce locally the slow DS startup (due to slapi-nis
    priming). In fact the version I was using had not the slapi-nis fix
    to differ the priming.

    I failed to reproduce the intensive load on DS when krb5kdc startup.
    Looking at yours logs, we can see that krb5kdc startup triggers a
    set of requests during 3s up to 8s. The logs are looking like (note
    the etime can go up to 2s):

    [10/Mar/2016:14:20:35 -0600] conn=40 fd=87 slot=87 connection from
    local to /var/run/slapd-UOFMT1.socket
    [10/Mar/2016:14:20:36 -0600] conn=40 AUTOBIND dn="cn=Directory Manager"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=0 BIND dn="cn=Directory
    Manager" method=sasl version=3 mech=EXTERNAL
    [10/Mar/2016:14:20:36 -0600] conn=40 op=0 RESULT err=0 tag=97
    nentries=0 *etime=1* dn="cn=Directory Manager"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=1 SRCH
    base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0
    filter="(objectClass=*)" attrs=ALL
    [10/Mar/2016:14:20:36 -0600] conn=40 op=1 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:36 -0600] conn=40 op=2 SRCH
    base="cn=ipaConfig,cn=etc,dc=uofmt1" scope=0
    filter="(objectClass=*)" attrs="ipaConfigString ipaKrbAuthzData
    ipaUserAuthType"
    [10/Mar/2016:14:20:36 -0600] conn=40 op=2 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:36 -0600] conn=40 op=3 SRCH base="dc=uofmt1"
    scope=2 filter="(objectClass=ipaNTDomainAttrs)" attrs="ipaNTFlatName
    ipaNTFallbackPrimaryGroup ipaNTSecurityIdentifier"
    [10/Mar/2016:14:20:37 -0600] conn=40 op=3 RESULT err=0 tag=101
    nentries=0 *etime=1*
    [10/Mar/2016:14:20:37 -0600] conn=40 op=4 SRCH
    base="cn=UOFMT1,cn=kerberos,dc=uofmt1" scope=0 filter="(krbMKey=*)"
    attrs="krbMKey"
    [10/Mar/2016:14:20:37 -0600] conn=40 op=4 RESULT err=0 tag=101
    nentries=1 etime=0
    [10/Mar/2016:14:20:37 -0600] conn=40 op=5 SRCH base="dc=uofmt1"
    scope=2
    filter="(&(|(objectClass=krbprincipalaux)(objectClass=krbprincipal))(krbPrincipalName=K/M at UOFMT1))"
    attrs="krbPrincipalName krbCanonicalName ipaKrbPrincipalAlias
    krbUPEnabled krbPrincipalKey krbTicketPolicyReference
    krbPrincipalExpiration krbPasswordExpiration krbPwdPolicyReference
    krbPrincipalType krbPwdHistory krbLastPwdChange krbPrincipalAliases
    krbLastSuccessfulAuth krbLastFailedAuth krbLoginFailedCount
    krbExtraData krbLastAdminUnlock krbObjectReferences krbTicketFlags
    krbMaxTicketLife krbMaxRenewableAge nsAccountLock passwordHistory
    ipaKrbAuthzData ipaUserAuthType ipatokenRadiusConfigLink objectClass"
    [10/Mar/2016:14:20:38 -0600] conn=40 op=5 RESULT err=0 tag=101
    nentries=1 *etime=1*
    [10/Mar/2016:14:20:39 -0600] conn=40 op=6 UNBIND
    [10/Mar/2016:14:20:39 -0600] conn=40 op=6 fd=87 closed - U1


    I think the request op=3 (SRCH base="dc=uofmt1" scope=2
    filter="(objectClass=ipaNTDomainAttrs)") is slow also because of
    slapi-nis. In fact it is indexed and returns 0 entry. So only
    plugins can create this high etime.
    An improvement in slapi-nis makes its search callback noop when it
    comes from krb and I am running this improvement.

    In conclusion I think both slow DS startup and KRB5 startup are
    fixed in RHEL 7.

    thanks
    theirry


On 03/14/2016 08:46 PM, Daryl Fonseca-Holt wrote:
> Hello Thierry,
>
> Attached is the pstacks from only the final DS restart. I don't think 
> they will show the whole picture.
>
> According to the debug log /var/log/ipareplica-install.log (attached) 
> the start of the krb5kdc.service (19:13:16Z) is successful, but the 
> krb5kdc log (attach) shows it is unable to fetch the master K/M at 
> 14:31:31CDT (-5hour offset). This is when the install log shows 
> kadmind failing.
>
> In my experience with the master observing top there are two intense 
> times for ns-slapd-<instance>. The first when it start, of course, and 
> the second when krb5kdc starts. I assume this is because krb5kdc must 
> get it's configuration and data from the same DS. krb5kdc fails but 
> the ipareplica-install script isn't aware of it. Finally 
> kadmin.service tries to access krb5kdc and finds that it is dead.
>
> Please note these logs are with Schema Compatability and NIS plugins 
> turned off per the other e-mail from Alexander.
>
> I've noticed on a running master I can prevent this type of failure by 
> manually starting dirsrv (systemctl start dirsrv@<instance>.service), 
> watch top until all threads of ns-slapd have settled, then systemctl 
> start krb5kdc.service, again watching top until ns-slapd threads have 
> settled down before systemctl start kadmin.service.  This kind of 
> manual intervention is is not possible when running the 
> ipareplica-install script.
>
> I will look into introducing a delay at the completion of the dirsrv 
> and krb5kdc systemd units and see if I can accommodate 
> ipareplica-install. Just as an experiment for now. I need to advance 
> the project into High Availability testing but cannot do so without a 
> functioning replica.
>
> Regards, Daryl
>
> On 03/14/16 09:20, thierry bordaz wrote:
>> Hi Daryl,
>>
>> Thanks for all the data. I will look at the pstacks. A first look 
>> shows that you capture import, bind... so may be a complete 
>> ipa-replica-install session.
>> I will try to retrieve the specific startup time to see what was 
>> going on at that time.
>> If you have the time to monitor only startup, it will help me 
>> shrinking the set of pstacks.
>> Startup of DS last > 1min. If you may start DS and as soon as the 
>> ns-slapd process is launched, do regular pstacks. Then when you are 
>> able to send a simple ldapsearch (ldapsearch -x -b "" -s base), you 
>> may stop taking pstacks.
>>
>> thanks
>> thierry
>>
>> On 03/14/2016 03:06 PM, Daryl Fonseca-Holt wrote:
>>> Hi Thierry,
>>>
>>> I moved the old logs into a subdirectory called try1. I did the 
>>> recommended ipa-server-install --uninstall. Tried the replica 
>>> install again. Failed during kadmind start like the previous time.
>>>
>>> The log from ipa-replica-install (with -d) is at 
>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>> The console script (mostly the same as the log but with my entries) 
>>> is at 
>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console
>>> The 5 second pstacks are at 
>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/slapd-pstacks.console
>>>
>>> Thanks, Daryl
>>>
>>>
>>> On 03/11/16 02:40, thierry bordaz wrote:
>>>> Hello Deryl,
>>>>
>>>>     My understanding is that ns-slapd is first slow to startup.
>>>>     Then when krb5kdc is starting it may load ns-slapd.
>>>>
>>>>     We identified krb5kdc may be impacted by the number of users
>>>>     accounts.
>>>>     From the ns-slapd errors log it is not clear why it is so slow
>>>>     to start.
>>>>
>>>>     Would you provide the ns-slapd  access logs from that period.
>>>>     Also in order to know where ns-slapd is spending time, it would
>>>>     really help if you can get regular (each 5s) pstacks (with
>>>>     389-ds-debuginfo), during DS startup and then later during
>>>>     krb5kdc startup.
>>>>
>>>>     best regards
>>>>     thierry
>>>>
>>>>
>>>> On 03/10/2016 11:10 PM, Daryl Fonseca-Holt wrote:
>>>>> Environment:
>>>>>   RHEL 7.2
>>>>>   IPA 4.2.0-15
>>>>>   nss 3.19.1-19
>>>>>   389-ds-base 1.3.4.0-26
>>>>>   sssd 1.13.0-40
>>>>>
>>>>>
>>>>> I've encountered this problem in IPA 3.0.0 but hoped it was 
>>>>> addressed in 4.2.0.
>>>>>
>>>>> Trying to set up a replica of a master with 150,000+ user 
>>>>> accounts, NIS and Schema Compatability enabled on the master.
>>>>>
>>>>> During ipa-replica-install it attempts to start IPA. dirsrv 
>>>>> starts, krb5kdc starts, but then kadmind fails because krb5kdc has 
>>>>> gone missing.
>>>>>
>>>>> This happens during restart of IPA in version 3.0.0 too. There it 
>>>>> can be overcome by manually starting each component of IPA _but_ 
>>>>> waiting until ns-slapd-<instance> has settled down (as seen from 
>>>>> top) before starting krb5kdc. I also think that the startup of 
>>>>> krb5kdc loads the LDAP instance quite a bit.
>>>>>
>>>>> There is a problem in the startup logic where dirsrv is so busy 
>>>>> that even though krb5kdc successfully starts and allows the kadmin 
>>>>> to begin kdb5kdc is not really able to do its duties.
>>>>>
>>>>> I'm reporting this since there must be some way to delay the start 
>>>>> of krb5kdc and then kadmind until ns-slapd-<instance> is really 
>>>>> open for business.
>>>>>
>>>>> # systemctl status krb5kdc.service
>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>    Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; 
>>>>> disabled; vendor preset: disabled)
>>>>>    Active: inactive (dead)
>>>>>
>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 
>>>>> 5 KDC.
>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting 
>>>>> Kerberos 5 KDC...
>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 
>>>>> 5 KDC.
>>>>>
>>>>> # systemctl status krb5kdc.service
>>>>> ● krb5kdc.service - Kerberos 5 KDC
>>>>>    Loaded: loaded (/usr/lib/systemd/system/krb5kdc.service; 
>>>>> disabled; vendor preset: disabled)
>>>>>    Active: inactive (dead)
>>>>>
>>>>> Mar 10 14:19:13 jutta.cc.umanitoba.ca systemd[1]: Stopped Kerberos 
>>>>> 5 KDC.
>>>>> Mar 10 14:20:36 jutta.cc.umanitoba.ca systemd[1]: Starting 
>>>>> Kerberos 5 KDC...
>>>>> Mar 10 14:20:39 jutta.cc.umanitoba.ca systemd[1]: Started Kerberos 
>>>>> 5 KDC.
>>>>>
>>>>> journalctl -xe was stale by the time I got to it so I've attached 
>>>>> /var/log/messages instead.
>>>>>
>>>>> The log from ipa-replica-install (with -d) is at 
>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipareplica-install.log
>>>>> The console script (mostly the same as the log but with my 
>>>>> entries) is at 
>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/ipa-replica-install.console
>>>>> The /var/log/dirsrv/ns-slapd-<instance> access log is at 
>>>>> http://home.cc.umanitoba.ca/~fonsecah/ipa/access
>>>>>
>>>>> Regards, Daryl
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> -- 
>>>   --
>>>   Daryl Fonseca-Holt
>>>   IST/CNS/Unix Server Team
>>>   University of Manitoba
>>>   204.480.1079
>>
>
> -- 
>   --
>   Daryl Fonseca-Holt
>   IST/CNS/Unix Server Team
>   University of Manitoba
>   204.480.1079

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/freeipa-users/attachments/20160316/fe69f097/attachment.htm>


More information about the Freeipa-users mailing list